This is going to be an issue I doubt many will ever encounter, but as the result is quite catastrophic, I wanted to share. Yes, I'm aware Threadripper is a workstation, not a server CPU - but the honest fact is that AMD and Intel workstation systems are widely used in test/development server workloads, even if not officially supported.
Scenario: - Multiple systems with X399 Taichi, Threadripper 1920X, 64 GB memory, single Samsung 960 EVO SSD, Intel X520 network card and various basic console grade display controllers - BIOS 1.80, all defaults except IOMMU disabled - Windows Server 2016 with all latest updates, running a failover cluster with Hyper-V workload, storage via iSCSI
This setup has worked perfectly so far and Threadripper is by far my favorite current hardware option for these environments. However, the problem occurs when running legacy workloads.
Add to this setup a single virtual machine running Debian 3.1 (from 2005). It's been running on a very recent Xeon system now for months without problems. When moved to Threadripper, it starts up just fine, but after two-three hours, it freezes the system. Being in a failover cluster, the workload moves to the next node, then that freezes as well, until every node has frozen completely and we have a major cluster...issue. It's not a reboot or a blue screen, simply a total system freeze that only recovers with the reset button. Nothing meaningful is logged into the event log.
Now, obviously, a 2005 OS and kernel 2.6.8 have no idea about Threadripper. It would make sense that it would not even boot. However, I find it dangerous that it completely freezes the system. Running systems this old is a bad idea, obviously, but there are cases when legacy workloads need to be simulated. Being able to crash the hardware with old software is, simply put, a bad thing.
For obvious reasons, I prefer not to test this setup too many times. It may be just a test/development environment, but I still dislike total cluster crashes. I've relocated the old workload to Intel hardware for now. Most of all, I wanted to raise this to the attention of anyone curious as this MAY be sign of a more serious flaw.
------------- Evaluating X399 + Threadripper systems for light server use.
|