Print Page | Close Window

Short-circuit in B550M Steel Legend£

Printed From: ASRock.com
Category: Technical Support
Forum Name: AMD Motherboards
Forum Description: Question about ASRock AMD motherboards
URL: https://forum.asrock.com/forum_posts.asp?TID=26679
Printed Date: 21 May 2024 at 8:37pm
Software Version: Web Wiz Forums 12.04 - http://www.webwizforums.com


Topic: Short-circuit in B550M Steel Legend£
Posted By: a13antichrist
Subject: Short-circuit in B550M Steel Legend£
Date Posted: 22 Sep 2023 at 6:02am
I have had my B550M Stl.Lgd for about 24 months and it has constantly been plagued by random reboots. Mostly it is just flat-out system reset with no warning or blue screen or anything.
The past few weeks it had been giving a WHEA_UNCORRECTABLE_ERROR blue screen that would not calculate its %, I would always have to hard-reset manually when that happens.
I had suspect a short-circuit somewhere because every now and again when I plug in a USB pen drive in the rear, the whole thing shuts down, instantly. Not very often, but it happened enough to be noteworthy. This is when I got suspicious about the board.
On top of that, it would almost -always- crash/reset if I had Polychrome open for long enough. So I got pretty good at flipping it open, configuring quickly and closing it again. I do have a lot of RGB in here and so I thought maybe I had hooked something up incorrectly or crossed a wire.
But I have since disconnected all the RGB (unplugged the ARGB headers) and the issue continues.
Now on a recent crash my 2Tb PCI Gen4 SSD (installed in the under-the-Armor/GPU slot) didn't come back up again. It's no longer recognised in any slot, case or reader. It's fried. Not nice.

So I ordered a new SSD, thinking, well ok, maybe SSD issue, that's what was causing the crashes.

But no. The new SSD (different model) crashes also. Only 3 times so far, but that's in a week, so not exactly rare either.

There is one curiosity I have noticed, and that's that HWinfo reports wayward readings for the metric: Power Deviation Reporting Accuracy
At the moment it looks like this:
Current -- Minimum -- Maximum -- Average
64.4% -- 50% -- 141.3% -- 67.6%

For info the max CPU core temp at this point is 64°.

I have no overclock in the system; except that I am using 3600MHz RAM with resulting InfinityFabric matching, I guess. BUt I have also run the RAM at 2666 just to be sure and the reset still happens.

The specs are:
Ryzen 9 5950X
RTX 3090 FE (but had a 3080 originally and had the same issues)
64Gb 4x16Gb 3600Mhz, full-auto
Intel X520 10G NIC
2tb main M.2 under Armor/GPU + 4tb SSD in second/PCI-gen3
Lian Li O11D mini
and well 14x 120mm fans. :E
PSU is CoolerMaster V850 Gold SFX.
Also have Gelid Extender cables on both the GPU (2x) and 24pin ATX.

If I had a spare PSU, I would test swapping that out, but I don't, and anyway there is nothing SFX over 850 that is remotely good value so it would be cheaper to swap out the MB in that case. Though less convenient obviously.

So the question is are there any known issues with this board, any other reports of similar issues? Or any clues here that someone can recognize as potentially indicative?
Appreciate any thoughts.


-------------
R9 5950x | Asrock B550M Steel Legend | 64Gb 3600Mhz | RTX 3090 FE
R3 2200G | ASRock Fatal1ty AB350 ITX

Dell Latitude 7470 | QNAP @58Tb | Mikrotik routers/wifi



Replies:
Posted By: eccential
Date Posted: 22 Sep 2023 at 7:49am
I'm not aware of any known issues with the board. I have one and it's been perfectly reliable, like all my AsRock (and one AsRockRack) systems. I built 9 of them.

The one with the B550M Steel Legend is my most powerful system, with 5800X3D, 64GB, four SSDs (two NVMe, two SATA), and two BD-RW drives. My GPU is just a RX-6600. All powered by a Seasonic Fanless Prime PX-500.

One thing I noticed is your RAM setup. You have four DIMMs at 3600. That's probably kind of hard on the memory controller. I only have two DIMMs (32+32), and they're 100% JEDEC-spec 3200MT/s ECC sticks. JEDEC-spec, so 22-22-22 timing. LOL

If you suspect shorting somewhere, I'd remove everything and rebuild, after inspecting each component as I re-seat them. Watch out for motherboard stand that shouldn't be there and what not.

I've had PCIe devices not working due to dirty contacts, but they just don't work at all, rather than intermittent issues. A wipe with alcohol fixes them.


Posted By: a13antichrist
Date Posted: 24 Sep 2023 at 1:55am
I've been leaning towards a rebuilt but I have three fans on a tower CPU cooler and it was *really* a b*tch to get it all mounted so I'm trying to avoid a full disassembly heh.
4 DIMMs @3600 could be a problem you reckon? You know I did suspect memory initially.. just because an immediate power-off usually ties to memory fault. But the thing with the USB drive.. that *has* to be electrical.
The other thing is that, at least before my latest refresh (new OS), the main time it would hang is during high GPU load.. which correlates with high memory usage but also potentially the PSU, since the card is demanding at high power.
So many variables and no way to really test any of them. :/ Or is it possible to disable memory banks in BIOS.. don't recall see the option.



-------------
R9 5950x | Asrock B550M Steel Legend | 64Gb 3600Mhz | RTX 3090 FE
R3 2200G | ASRock Fatal1ty AB350 ITX

Dell Latitude 7470 | QNAP @58Tb | Mikrotik routers/wifi


Posted By: eccential
Date Posted: 24 Sep 2023 at 4:13am
You can always try down-clocking (urr, normal-clocking) the RAM.
No need to disassemble anything, since it's just a BIOS setting.

*OFFICIALLY*, the max supported speed for 4 DIMMs is 2667MT/s, unless all 4 DIMMs are single-rank. Then, 2933MT/s is officially supported.

Most latest 16GB DIMMs are single-rank, because DRAM density has gone up over the years. If the vendor doesn't give you good spec, the rule of thumb is, if it has DRAM chips on both sizes, it's dual-rank. There should be 16 DRAM chips total (2Rx8), or 18 chips if ECC.

If DRAM chips are only on one side, it's likely single-rank (1Rx8).

Technically, 2Rx16 (8 DRAM chips, each at 16-bits wide) might be possible, but I've never seen such a monstrosity. 1Rx16 DIMMs are horrible for performance.


Posted By: a13antichrist
Date Posted: 25 Sep 2023 at 2:53am
Originally posted by eccential eccential wrote:

You can always try down-clocking (urr, normal-clocking) the RAM.


Quote
BUt I have also run the RAM at 2666 just to be sure and the reset still happens.




My RAM is Crucial Ballistix 16-18-18-38. I do notice it says 1.35V while I seem to recall seeing 1.2 in BIOS. Will check this further.

One thing I found today though after another random reset entirely while the computer was idle:


EventID 18: WHEA Error Logger.

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 0

The details view of this entry contains further information.

This might align with the Power Reporting imbalances I mentioned above.

-------------
R9 5950x | Asrock B550M Steel Legend | 64Gb 3600Mhz | RTX 3090 FE
R3 2200G | ASRock Fatal1ty AB350 ITX

Dell Latitude 7470 | QNAP @58Tb | Mikrotik routers/wifi


Posted By: eccential
Date Posted: 25 Sep 2023 at 4:15am
Well, all that says is there was an uncorrectable hardware error.

If you disable "Automatically restart" in the "Start up and Recovery" setting, you might actually see it the BSOD, not that it would help you.

And if you have it dump memory, somebody might be able to help pinpoint the actual error. But even that might be useless if you're getting different and unrelated error each time.

Even after just a quick online search, I see all kind of different things can cause the MCE, and therefore, fix is also as diverse. One guy fixed it with a new power supply. Others replaced different parts.

Even re-seating things (CPU, PCIe cards) might help. I've personally fixed a system by cleaning the motherboard socket once. I flooded the empty socket with CRC Electronics cleaner and wiggled the lever around to make sure all the wires get touched by the cleaner. But this was an obvious situation, as someone (not me) let thermal paste fall into the socket.

Anyway, almost anything and everything can cause MCE. So I don't think a random nobody like me can really help guide you here.

One of the reasons why all my PC builds use ECC memory is to eliminate unknowns, if only just one.


Posted By: a13antichrist
Date Posted: 30 Sep 2023 at 7:14am
Originally posted by eccential eccential wrote:

Well, all that says is there was an uncorrectable hardware error.

If you disable "Automatically restart" in the "Start up and Recovery" setting, you might actually see it the BSOD, not that it would help you.


I see the BSOD, at least did before the SSD assassination. It says WHEA_Uncorrectable_Error ;) You're right, nothing helpful heh.

But every crash seems to be followed by this same Processor error. And I do see 'power reporting' wayward figures. So that seems correlated.

On the other hand a 'live' line on the MB to the case that resets the machine if I touch the USB stick to it.. that could literally manifest as any issue at all.

Originally posted by eccential eccential wrote:


Even after just a quick online search, I see all kind of different things can cause the MCE, and therefore, fix is also as diverse. One guy fixed it with a new power supply. Others replaced different parts.


Many things could lead to an MCE. But how many things can lead to an MCE *and* a live case?


Originally posted by eccential eccential wrote:


One of the reasons why all my PC builds use ECC memory is to eliminate unknowns, if only just one.


I usually keep spare parts of everything on hand to check/test with. Saves the fluffing around. But I'm trying to go minimal now, I'm even getting rid of my stock of RGB :D

-------------
R9 5950x | Asrock B550M Steel Legend | 64Gb 3600Mhz | RTX 3090 FE
R3 2200G | ASRock Fatal1ty AB350 ITX

Dell Latitude 7470 | QNAP @58Tb | Mikrotik routers/wifi


Posted By: Skybuck
Date Posted: 09 Oct 2023 at 2:57am
I believe this may be some windows 11 related error and might also require some kind of bios update... not sure... I think MSI motherboards had this issue... strange to see it on asrock as well ?

There might also be a windows 11 update/fix for this...


Posted By: a13antichrist
Date Posted: 14 Oct 2023 at 5:32am
So I did a bit more playing around.. ripped everything out.. re-seated a few MB standoffs.. not sure that did much.. but I can't seem to crash it now with the USB stick, that's a good sign heh.

I started running OCCT and Memory, GPU, GPU Mem all passed.. then I tried CPU. THis is what I got:

00:00:00 - Info - Test schedule started at 2023-10-13 22:50:42
00:00:00 - Info - CPU - Started (Duration : 01:00:00)
00:22:25 - Error - CPU - 1 error(s) found on physical core #13 - logical core #26
00:22:26 - Error - CPU - 4 error(s) found on physical core #13 - logical core #26
00:22:27 - Error - CPU - 4 error(s) found on physical core #13 - logical core #26
00:22:28 - Error - CPU - 4 error(s) found on physical core #13 - logical core #26
00:22:29 - Error - CPU - 4 error(s) found on physical core #13 - logical core #26
00:22:30 - Error - CPU - 5 error(s) found on physical core #13 - logical core #26
00:22:31 - Error - CPU - 3 error(s) found on physical core #13 - logical core #26
00:22:32 - Error - CPU - 4 error(s) found on physical core #13 - logical core #26
00:22:33 - Error - CPU - 4 error(s) found on physical core #13 - logical core #26
....
....
....
00:26:34 - Error - CPU - 4 error(s) found on physical core #13 - logical core #26
00:26:34 - Info - CPU - Test stopped


So.. that looks.. informative. Can I draw anything from that? Or maybe bad test? Am new to OCCT so not sure if this is conclusive...

-------------
R9 5950x | Asrock B550M Steel Legend | 64Gb 3600Mhz | RTX 3090 FE
R3 2200G | ASRock Fatal1ty AB350 ITX

Dell Latitude 7470 | QNAP @58Tb | Mikrotik routers/wifi


Posted By: a13antichrist
Date Posted: 14 Oct 2023 at 7:14am
It just flash-reset again, literally while doing nothing but reading over my post above.

-------------
R9 5950x | Asrock B550M Steel Legend | 64Gb 3600Mhz | RTX 3090 FE
R3 2200G | ASRock Fatal1ty AB350 ITX

Dell Latitude 7470 | QNAP @58Tb | Mikrotik routers/wifi


Posted By: eccential
Date Posted: 14 Oct 2023 at 7:33am
My guess.
There was some sort of shorting at some point.
Repeated shock damaged CPU.
Considering you're getting multiple errors on the same thread, I'd say you should get a new CPU. Seeing as there's no way to know for sure if the CPU was bad from the get go, start a warranty process with AMD and see where it goes.



Print Page | Close Window

Forum Software by Web Wiz Forums® version 12.04 - http://www.webwizforums.com
Copyright ©2001-2021 Web Wiz Ltd. - https://www.webwiz.net