ASRock.com Homepage
Forum Home Forum Home > Technical Support > AMD Motherboards
  New Posts New Posts RSS Feed - ryzen 1700 on taichi x370 no OC random hard lock
  FAQ FAQ  Forum Search Search  Events   Register Register  Login Login

ryzen 1700 on taichi x370 no OC random hard lock

 Post Reply Post Reply Page  12>
Author
Message
ioio View Drop Down
Newbie
Newbie


Joined: 15 Jul 2017
Status: Offline
Points: 8
Post Options Post Options   Thanks (0) Thanks(0)   Quote ioio Quote  Post ReplyReply Direct Link To This Post Topic: ryzen 1700 on taichi x370 no OC random hard lock
    Posted: 15 Jul 2017 at 8:04am
cpu / mobo:  ryzen 1700 on taichi x370
Bios adjustments:  SMT disabled, OPcache disabled
BIOS Ver:  2.40.  I noticed a new one came out today.  I'll try this out next week.
System:  Ubuntu Server 16.04 64bit + HWE + mainline kernel 4.11.9
Memory: Crucial tech 16GB x2 (CT2K16G4DFD8213) operating in dual channel mode
PSU: New EVGA 750 GQ (GQ-0750-V1
Root and Boot installed on ADATA SU800 M.2 2280 128GB (ASU800NS38-128GT-C)
No RAID 
Additional storage drives installed, I can provide details if requested.
video:  nvidia quadro NVS 420.  (the issue still occurs with no video card installed)
temps:  iv not done a detailed analysis of the CPU temps.  I can't check it remotely yet (I'm usually remote).  GPU core is 41C 

I'm experiencing hard lockup between 3-9 hours of runtime.  The screen goes black, keyboard lights turn off, the network goes away.  The only option is power cycle.  I see no hardware errors in syslog, just some complaints from docker and VirtualBox about the mainline kernel. After 1 crash I noted a Dr. Dubug code of 90.  I'll take more samples next week.

The system does not lock up when I schedule a reboot every 3 hours in cron. Currently running 1.5 days since cron was setup.

no attempt has or will be made to oc the hardware.  Bios config is stock except for SMT disabled and OPcache disabled.

Load on the system is very light (as seen with htop).  Current roles are:
  >  LVM managing 4 4TB disks
  >  recording 8 512k video streams over NFS
  >  constant file sync + compression with 2 other servers (syncthing)
  >  file sharing via samba domain membership 

When researching I found several close sounding matches to my issue that seem to be resolved with a kernel and BIOS upgrade, which I've tried.  I also see that programmers are running into trouble when compiling but I've not tried that.  The majority of hits relate to OC which I'm not interested in at this time.  I would adjust/underclock to resolve this issue.

I'm hoping to overcome these crashes!  Early next week I'm planning to reseat/swap/test out memory, reflash/reset bios, and try an alternative NIC.  I'd appreciate any suggestions or advice to resolve the issue.

thanks


Edited by ioio - 15 Jul 2017 at 8:26am
Back to Top
MisterJ View Drop Down
Senior Member
Senior Member


Joined: 19 Apr 2017
Status: Offline
Points: 1097
Post Options Post Options   Thanks (0) Thanks(0)   Quote MisterJ Quote  Post ReplyReply Direct Link To This Post Posted: 15 Jul 2017 at 8:29am
ioio, I cannot find your memory in the qualified list for Taichi.  What slots are you using (A2 & B2)?  Have you done a Load UEFI Defaults?  I am curious why you are running with SMT and OPcache disabled?  I had a similar problem (see signature for specs) and it turned out to be bad memory slots A1 and A2. A MB RMA solved the problem.  Have you opened a ticket with ASRock?  Please do, if not.  Can you beg, borrow or steal some memory on the qualified list?  For testing, try one stick in A1 and if it fails, try A2.  Continue until all slots are tested and if all fail, then try the other stick.  Good luck and enjoy, John.
Fat1 X399 Pro Gaming, TR 1950X, RAID0 3xSamsung SSD 960 EVO, G.SKILL FlareX F4-3200C14Q-32GFX, Win 10 x64 Pro, Enermx Platimax 850, Enermx Liqtech TR4 CPU Cooler, Radeon RX580, BIOS 2.00, 2xHDDs WD
Back to Top
ioio View Drop Down
Newbie
Newbie


Joined: 15 Jul 2017
Status: Offline
Points: 8
Post Options Post Options   Thanks (0) Thanks(0)   Quote ioio Quote  Post ReplyReply Direct Link To This Post Posted: 16 Jul 2017 at 3:50am
John,

Thank you for sharing your experience.  I just put some qualified RAM on order  (HX421C14FBK2/8).  I should have it to test on Tuesday.  On Monday I'll try other slot configurations as you suggest.  

I went with non-qualified RAM because it's ECC at a decent price.   The 2 qualified ECC modules are scarce and expensive.

I disabled SMT and OPcache because my research found there might be problems with them on Linux.  AMD is giving that instruction to some users in their support forums.  It's affecting programmers with specific workloads.  I don't think I'm affected by this issue but I tried turning them off regardless.  It is the case that I had to upgrade to an unsupported kernel for a stable system, I think that is a must for all Ubuntu LTS/Ryzen users.  This is due to Ryzen's SMT implementation.   I will reset back to defaults on Monday and continue testing.

I'll open a ticket with ASRock support.  

I'll post back if it gets resolved.  Thanks again for the help.
Back to Top
MisterJ View Drop Down
Senior Member
Senior Member


Joined: 19 Apr 2017
Status: Offline
Points: 1097
Post Options Post Options   Thanks (0) Thanks(0)   Quote MisterJ Quote  Post ReplyReply Direct Link To This Post Posted: 16 Jul 2017 at 4:10am
Thanks for the update, ioio.  I did not realize your RAM was ECC.  Few here use it, I suspect.  You may be exploring almost virgin territory.  Hopefully you can get a refund on one or the other memories.  I'll keep an eye for updates.  Thanks and enjoy, John.
Fat1 X399 Pro Gaming, TR 1950X, RAID0 3xSamsung SSD 960 EVO, G.SKILL FlareX F4-3200C14Q-32GFX, Win 10 x64 Pro, Enermx Platimax 850, Enermx Liqtech TR4 CPU Cooler, Radeon RX580, BIOS 2.00, 2xHDDs WD
Back to Top
wardog View Drop Down
Moderator Group
Moderator Group


Joined: 15 Jul 2015
Status: Offline
Points: 6447
Post Options Post Options   Thanks (0) Thanks(0)   Quote wardog Quote  Post ReplyReply Direct Link To This Post Posted: 16 Jul 2017 at 9:56am
Originally posted by ioio ioio wrote:

On Monday I'll try other slot configurations as you suggest.


A2 and B2 are stated in your manual and online on the MB's Specification tab as a table beside Memory.

Crucial says they are compatible, BUT they are not ECC sticks. Both Crucial and other sites confim  this.

http://www.crucial.com/usa/en/x370-taichi/CT9993048#productDetails

Product Specifications
Brand Crucial
Form Factor UDIMM
Total Capacity 32GB Kit (16GBx2)
Warranty Limited Lifetime
Specs DDR4 PC4-17000 ??CL=15 ??Dual Ranked ??x8 based ??Unbuffered ??NON-ECC ??DDR4-2133 ??1.2V ??/td>
Series Crucial
ECC NON-ECC
Module Qty 2
Speed 2133 MT/S
Voltage 1.2V
DIMM Type Unbuffered
Back to Top
ioio View Drop Down
Newbie
Newbie


Joined: 15 Jul 2017
Status: Offline
Points: 8
Post Options Post Options   Thanks (0) Thanks(0)   Quote ioio Quote  Post ReplyReply Direct Link To This Post Posted: 16 Jul 2017 at 11:26am
Good catch wardog.  Somehow I got the impression that it was ECC.  OPPS!  It looks like I need CT9993113 for ECC.  I'll try that if I can get the system stable on certified modules.  

This saved me some time in the near future when I would try to figure out why ECC tests fail.

This system is recording and syncing files non-stop so I would like the extra protection ECC provides.

Thanks! 
Back to Top
ioio View Drop Down
Newbie
Newbie


Joined: 15 Jul 2017
Status: Offline
Points: 8
Post Options Post Options   Thanks (0) Thanks(0)   Quote ioio Quote  Post ReplyReply Direct Link To This Post Posted: 18 Jul 2017 at 4:58am
This morning I applied BIOS update 3.0 and reset UEIF to defaults.  The system has been stable under load for 6.5 hours.  I haven't seen it pass 10 hours without crashing so I'll have a better idea on the status tomorrow.  

Edited by ioio - 18 Jul 2017 at 5:06am
Back to Top
MisterJ View Drop Down
Senior Member
Senior Member


Joined: 19 Apr 2017
Status: Offline
Points: 1097
Post Options Post Options   Thanks (0) Thanks(0)   Quote MisterJ Quote  Post ReplyReply Direct Link To This Post Posted: 18 Jul 2017 at 8:01am
Thanks, ioio.  BIOS 3.00 does seem to help in the memory area, judging from the posts here.  I assume you continue to run your initial RAM.  Enjoy, John.
Fat1 X399 Pro Gaming, TR 1950X, RAID0 3xSamsung SSD 960 EVO, G.SKILL FlareX F4-3200C14Q-32GFX, Win 10 x64 Pro, Enermx Platimax 850, Enermx Liqtech TR4 CPU Cooler, Radeon RX580, BIOS 2.00, 2xHDDs WD
Back to Top
ioio View Drop Down
Newbie
Newbie


Joined: 15 Jul 2017
Status: Offline
Points: 8
Post Options Post Options   Thanks (0) Thanks(0)   Quote ioio Quote  Post ReplyReply Direct Link To This Post Posted: 28 Jul 2017 at 11:49pm
I think the system is running stable now.  I'm currently at 50 hours up time.  I think the key adjustment was to disable Global C-state control in the BIOS.  Here are some notes:
  • I have not yet installed the certified RAM.  Still using the CT2K16G4DFD8213
  • I tried different RAM configurations. For this 50 hour run, I had one stick in slot B2
  • I just booted with the recommended RAM configuration of A2 & B2 to see if it is stable with disabled c-state control
  • Upgrading to BIOS 3.00 did not fix the crashing
  • I found some threads on Reddit from Debian users with similar issues that reported success disabling global c-state control
  • I'm running mainline kernel 4.11.9-041109-generic.  Kernel 4.10 was just released to Ubuntu 16.04 with HWE activated.  I will try switching to that if things stay stable.

I'll report back on the c-state fix after more verification. 
Back to Top
ioio View Drop Down
Newbie
Newbie


Joined: 15 Jul 2017
Status: Offline
Points: 8
Post Options Post Options   Thanks (0) Thanks(0)   Quote ioio Quote  Post ReplyReply Direct Link To This Post Posted: 31 Jul 2017 at 11:01pm
No crashes since I disabled Global C-state.  Approaching 60 hours with same RAM configuration I was using when I started this thread.
Back to Top
 Post Reply Post Reply Page  12>
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 12.04
Copyright ©2001-2021 Web Wiz Ltd.

This page was generated in 0.109 seconds.