Hello, our lab is active on the machine learning space we mostly buy motherboards from a competitive brand and for a change I've thought to buy an Asrock motherboard especially with all the good words that I'm hearing lately.
Our point is to have 5-7 GPUs active per node which operate as needed. Before shopping for a new motherboard I check for "Above 4G Decoding" support on the manual and 'till now it has given me great results. Sometimes all 6-7 cards would need to work in tandem.
Anyhow, we had a few 2011-3 CPUs of no use so I thought of making a new compute node based around Asrock X99 OC Formula. Our first Asrock motherboard, the "specs" checked out, but on my first boot (5 cards, all PCI-E ports occupied) I encountered problems, despite having "Above 4G encoding" enabled it was not enough. The system woould not boot, I -then- enabled the "Above 4G Encoding patch" and it finally booted but with more problems forthcoming:
Windows tells me that not enough resources are free to operate the 5th card. *Exactly* what I would expect to say on a system that does not support "Above 4G decoding". Which is a shame because the product is advertised as such.
To make a long story short, I've tried a couple of bios versions (including the latest) but to no avail. Also I've tried both consumer and enterprise versions of Windows (Windows 7/10 or equivalent), again to no avail.
Notably the one card that always gets disabled is the one running on PCI-E 3 (black port), it will come back to life ("magically") once I disable PCI-E port (whichever, say port PCI-E 4) PCI-E 3 comes back to life and Windows tells me that no errors are detected to the card. Also it's weird that the black Port would be the one to not work as it is linked to the M.2 (Ultra) port, which means that when 4 GPUs are installed , PCI-E M.2 modules are not and cannot be detected (there is no warning to day that the Ultra M.2 port gets disabled once all the yellow PCI-E ports are occupied)
Given that it was my idea to use grant money to buy an Asrock motherboard I'm currently on hot water so I'd appreciate a swift resolution which is why I'm posting to the forums first (if that won't suffice I will contact Asrock directly).
Again I've tried most bios revisions (I'm currently running v3.10, the latest) multiple (x64) versiors of Windows (no linux yet, as this workload is best ran on windows), I'm currently on Windows 10 x64. Also I'm using the latest Radeon Software version (although I suspect that plays the least role of all).
Naturally since I'm running a Xeon CPU, all 40 lanes the the CPU gives are in motherboard's use...
|