ASRock.com Homepage
Forum Home Forum Home > Technical Support > AMD Motherboards
  New Posts New Posts RSS Feed - Workaround for ASRock random reboots
  FAQ FAQ  Forum Search Search  Events   Register Register  Login Login

Workaround for ASRock random reboots

 Post Reply Post Reply
Author
Message
johnwbyrd View Drop Down
Newbie
Newbie


Joined: 14 Oct 2024
Status: Offline
Points: 55
Post Options Post Options   Thanks (0) Thanks(0)   Quote johnwbyrd Quote  Post ReplyReply Direct Link To This Post Topic: Workaround for ASRock random reboots
    Posted: 10 Dec 2024 at 4:41am
Here are some breadcrumbs for anyone debugging random reboot issues on Proxmox 8.3.1 or later.

tl:dr; If you're experiencing random unpredictable reboots on a Proxmox rig, try DISABLING (not leaving at Auto) your Core Watchdog Timer in the BIOS.

I have built a Proxmox 8.3 rig with the following specs:

- CPU: AMD Ryzen 9 7950X3D 4.2 GHz 16-Core Processor
- CPU Cooler: Noctua NH-D15 82.5 CFM CPU Cooler
- Motherboard: ASRock X670E Taichi Carrara EATX AM5 Motherboard
- Memory: 2 x G.Skill Trident Z5 Neo 64 GB (2 x 32 GB) DDR5-6000 CL30 Memory
- Storage: 4 x Samsung 990 Pro 4 TB M.2-2280 PCIe 4.0 X4 NVME Solid State Drive
- Storage: 4 x Toshiba MG10 512e 20 TB 3.5" 7200 RPM Internal Hard Drive
- Video Card: Gigabyte GAMING OC GeForce RTX 4090 24 GB Video Card
- Case: Corsair 7000D AIRFLOW Full-Tower ATX PC Case ??Black
- Power Supply: be quiet! Dark Power Pro 13 1600 W 80+ Titanium Certified Fully Modular ATX Power Supply ($448.72 @ Amazon)

This particular rig, when updated to the latest Proxmox with GPU passthrough as documented at https://pve.proxmox.com/wiki/PCI_Passthrough , showed a behavior where the system would randomly reboot under load, with no indications as to why it was rebooting. Nothing in the Proxmox system log indicated that a hard reboot was about to occur; it merely occurred, and the system would come back up immediately, and attempt to recover the filesystem.

At first I suspected the PCI Passthrough of the video card, which seems to be the source of a lot of crashes for a lot of users. But the crashes were replicable even without using the video card.

After an embarrassing amount of bisection and testing, it turned out that for this particular motherboard (ASRock X670E Taichi Carrarra), there exists a setting Advanced\AMD CBS\CPU Common Options\Core Watchdog\Core Watchdog Timer Enable in the BIOS, whose default setting (Auto) seems to be to ENABLE the Core Watchdog Timer, hence causing sudden reboots to occur at unpredictable intervals on Debian, and hence Proxmox as well.

The workaround is to set the Core Watchdog Timer Enable setting to Disable. In my case, that caused the system to become stable under load.

Because of these types of misbehaviors, I now only use zfs as a root file system for Proxmox. zfs played like a champ through all these random reboots, and never corrupted filesystem data once.

In closing, I'd like to send shame to ASRock for sticking this particular footgun into the default settings in the BIOS for its X670E motherboards. Additionally, I'd like to warn all motherboard manufacturers against enabling core watchdog timers by default in their respective BIOSes.
Back to Top
RealPjotr View Drop Down
Newbie
Newbie
Avatar

Joined: 12 Dec 2024
Status: Offline
Points: 15
Post Options Post Options   Thanks (0) Thanks(0)   Quote RealPjotr Quote  Post ReplyReply Direct Link To This Post Posted: 12 Dec 2024 at 9:28pm
Thanks, I have the exact problem you describe, but on a SIENAD8-2L2T motherboard and Epyc CPU. I also run latest Proxmox and have had random reboots. First they were more frequent and often reported M.2 hardware errors. That turned out to be WD SN770 SSDs not playing well with ZFS. I have replaced them with Samsung 990 Pros, they've been stable for 2-3 weeks, much better.

But today I had a completely unexpected reboot with nothing in any logs; system log, journalctl or IPMI. Googling I found your post, so I've changed this setting and hope it helps. Time will tell!
Back to Top
SilentCPU View Drop Down
Newbie
Newbie
Avatar

Joined: 24 Jan 2025
Location: London
Status: Offline
Points: 15
Post Options Post Options   Thanks (0) Thanks(0)   Quote SilentCPU Quote  Post ReplyReply Direct Link To This Post Posted: 24 Jan 2025 at 2:22am
I am experiencing a similar issue with next to now information on the web except from your post.

It began with my machine randomly switching off completely. I thought I had overloaded it at the time but then the powering off just turned into random reboots

Now it?™s rebooting every 20 minutes. No information in journalctl or any other logs.

I?™m going to try your suggestion and see if it helps. And I will report back such that the next wondering soul can get some certainty on what to do next!
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 12.04
Copyright ©2001-2021 Web Wiz Ltd.

This page was generated in 0.063 seconds.