Print Page | Close Window

Taichi X370 with Ubuntu idle lock ups/idle freeze

Printed From: ASRock.com
Category: Technical Support
Forum Name: AMD Motherboards
Forum Description: Question about ASRock AMD motherboards
URL: https://forum.asrock.com/forum_posts.asp?TID=5963
Printed Date: 26 Dec 2024 at 10:32am
Software Version: Web Wiz Forums 12.04 - http://www.webwizforums.com


Topic: Taichi X370 with Ubuntu idle lock ups/idle freeze
Posted By: HxwRXP
Subject: Taichi X370 with Ubuntu idle lock ups/idle freeze
Date Posted: 01 Sep 2017 at 2:20pm
" rel="nofollow - Im having quite a huge problem with this board. Well, im assuming its the board because thats the only thing i have upgraded together with the RAM and CPU.

Specs are:
CPU: Ryzen 7 1800x
RAM: GSkill FlareX for AMD Ryzen DDR4 32GB (16GBx2) 2400 CL16 (F4-2400C16D-32GFX) in DR -> A2 B2
SSD Drive over 2 years old
PSU 1200W over 2 years old

I have the latest BIOS version 3.00 downloaded directly from the BIOS

Everything works well while im using the OS Ubuntu, until i leave it and come back 6 hours later, i will try move the mouse and press on the keyboard but everything has frozen.  Until i do a reset, the PC will then restart normally.  There are no logs in kernel.log or syslog.log.
What is driving me crazy is that this is freezing the whole system while its IDLE for a long period. However i havent tried coming back 2 hours later or 3 hours later. I just notice it in the morning when i wake up everything is locked up.

I tried disabling C6 but still same problem exists.

Please anyone help and tell me whats going on??  should i downgrade the BIOS ?? I dont know what to do or what to try :(



Replies:
Posted By: Xaltar
Date Posted: 01 Sep 2017 at 3:49pm
Try a fresh OS install if you have not already.

If that doesn't help or you have already done so try disabling power saving entirely in the UEFI. It sounds very much like the issue is coming from linux not playing nice with power saving when idle. 

Linux is not officially supported so I would not be surprised if there are issues with it and the Taichi. Linux having so many distros and variants (even within disros) makes impossible for hardware manufacturers to properly support it. Support then must come from those responsible for the distro. 

If you want to be sure the issue is hardware related (your board) then you will need to install windows on the system and check for stability there. I know it sucks but that is what we are all used to using here. 

If you don't want to mess with windows (I don't blame you) then you should seek help on the Ubuntu forums. If your issue is common then it is likely support related with your hardware and distro, if not and you see you are the only one or one of only a few with the same issue then hardware could be the root of the issue. There may be some more avid linux users here that could help you out too but you will probably get a quicker response on the Distro forums. My linux knowledge is somewhat limited and I have not used Ubuntu on my own Taichi. Lubuntu 16.04 (Ubuntu derived) works flawlessly on my system from a bootable flash drive. I don't know how much the derivative distro has changed over Ubuntu however. 


-------------


Posted By: excieve
Date Posted: 01 Sep 2017 at 6:16pm
This issue is sometimes reported by Linux users with Ryzen on various MBs. See this thread about the infamous segfault problem on the AMD forums that also mentions this issue:
https://community.amd.com/thread/215773" rel="nofollow - https://community.amd.com/thread/215773

Although it's about a different issue in general, some people on that thread reported being able to get rid of the freezes by disabling the C states completely (not just C6). There's no clear reason why this happens though. I've personally never experienced it with C states enabled, primarily using Linux on my Ryzen machine with X370 Gaming K4.


Posted By: MisterJ
Date Posted: 02 Sep 2017 at 3:46am
" rel="nofollow - HxwRXP, please look at this thread and see if there is any help:

http://forum.asrock.com/forum_posts.asp?TID=5595&KW=ecc&PID=32901&title=ryzen-1700-on-taichi-x370-no-oc-random-hard-lock#32901

I would suggest you try the solution that worked for the poster above first.  I had a similar hang problem on W10 that turn out to be bad memory slots.  I ran it down by running only one stick in one slot at a time - testing all slots.  Two slots hung (not nearly as long as yours).  I RMAed the board and all is OK.  Hope this helps.  Enjoy, John.



-------------
Fat1 X399 Pro Gaming, TR 1950X, RAID0 3xSamsung SSD 960 EVO, G.SKILL FlareX F4-3200C14Q-32GFX, Win 10 x64 Pro, Enermx Platimax 850, Enermx Liqtech TR4 CPU Cooler, Radeon RX580, BIOS 2.00, 2xHDDs WD


Posted By: HxwRXP
Date Posted: 02 Sep 2017 at 3:41pm
" rel="nofollow - Guys thank you for your inputs!   Id like to give an update.

Firstly let me start by saying im not a pro in linux, so still a noob. Before upgrading my mobo, cpu and ram. I was using an old Asus workstation mobo with Intel cpu.  Ubuntu was working perfectly on this which was sitting on the SSD drive.
When i upgraded and started using AMD cpu, (the specs provided in the original post) i used the exact same OS, ie. i used the SSD drive that was in use in the previous setup.."as is".  The kernel booted and it worked just as i described however it froze the whole system after a few idle hours.

I decided to install a fresh Ubuntu on a new drive and i left it idle for more than 6 hours and nothing froze.

Now maybe i dont know this, but maybe its because the original SSD was originally installed using Intel CPU and I thought just changing setups wouldnt have any effect on the system.  However i think it does affect it in the long run, such as idle hard-locks.   Can anyone confirm that this could be the problem?

If this is the problem, then howcome Ubuntu booted perfectly on a new system and was working perfectly till it sat idle for a couple hours?

PS:  I did try stress tests on the CPU and that went fine, but havent tried stress tests on memory yet.


Posted By: MisterJ
Date Posted: 02 Sep 2017 at 11:36pm
HxwRXP, thanks for the update.  I have advised several W10 users not to do what you have done.  I know almost nothing about Linux, but using an OS from Intel on an AMD platform is asking for problems just as you have assumed.  I know Windows does install code specific to the processor.  A fresh OS install should always be done on a new system even if the OS and processor are not changing.  Thanks and enjoy, John.


-------------
Fat1 X399 Pro Gaming, TR 1950X, RAID0 3xSamsung SSD 960 EVO, G.SKILL FlareX F4-3200C14Q-32GFX, Win 10 x64 Pro, Enermx Platimax 850, Enermx Liqtech TR4 CPU Cooler, Radeon RX580, BIOS 2.00, 2xHDDs WD


Posted By: Kirurgs
Date Posted: 06 Sep 2017 at 10:24pm
Hi!

I'm actually full time Linux  user :)
For installation, surprise surprise, You do not need to reinstall the Linux when HW changes, it loads it's own built-in auto-detected drivers when computer starts, e.g. You can install Linux on flash drive on a desktop system with AMD and to to work and put that flash to intel system and all will work just fine.

What can I advise You - disable screen lock / off in "Brightness & Lock", it maybe related to that.

There is segfault issue, which is easy triggered in Linux, not that easy on windows, but that is due to high load.
There is this MCE freeze/reboot problem, which is strange one, maybe You experience that. I have experienced it, but it just went away after BIOS updates or voltage increase (I did run overclock on stock voltage, it apparently was not enough, so I slightly increased voltage). Can't say for sure what cured it.

Also, try resetting BIOS to defaults / load defaults or better disconnect all cables, CMOS battery out, wait 5 minutes, put in, reattach all the stuff, do not change anything in BIOS just boot up the system, disable screen lock / off and wait for problem to reoccur.

Wardog (forum moderator), which has more contact with AMD than we, suggests that before each BIOS update, one has to clear CMOS for the best results. Try that.

Also, pull out one of memory modules and try repeat the problem.

As for me, I had segfault issue, RMAd the CPU, no issues so far.

It may sound a lot to do, but the problem maybe in a lot of places, You need to just go and do one thing at a time, to rule it out.

BR, Kirurgs


Posted By: HxwRXP
Date Posted: 07 Sep 2017 at 5:52pm
" rel="nofollow - Hi @Kirurgs,

Thank you for your input!  Actually my problem is still not solved. I continue to have the freezes and YES you are right, my PC also randomly reboots every day if i have had my PC on idle OR. .like the other day i was busy simply browsing, hardly using any CPU and my PC just decided to reboot.

Im so upset about this because i have paid so much money on these parts. I really hope i find the solution.

So far i have tried disabling SMT and UEFI in the BIOS, and i had another freeze today.
I have also tried disabling screen lock like you said and i got another freeze.

I havent tested the RAM yet. but when i was downloading the blockchain, it did it so quick that i didnt have a problem with RAM neither did the PC reboot under load.  The many times i have experienced the random reboots, my PC has always been on idle.  However i have NOT seen any error messages in any logs.  The mcelog is not supported by AMD.  I get this error:

mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor.  Please use the edac_mce_amd module instead.  So i dont know what to use to see mce errors.. any idea?

I have a feeling it could be the voltage like you did. I need to try that out and overclock my CPU and RAM.

What motherboard do you have?
Were you getting error message anywhere?

I have few times seen on the Dr. Debug display that the error is 00, which means i must reseat the CPU, thats what the mobo manual says.


Posted By: Kirurgs
Date Posted: 07 Sep 2017 at 6:22pm
" rel="nofollow - HxwRXP,

I have "famous" x370 Gaming K4 :)
It really seems that either that is MCE bug or temperatures, what are temperatures, do they fluctuate a lot or are pretty stable and reasonable?
Mine are ~ 40C when OC and while idling, when there is a moderate load - up to 60C, when prime95 or stress, it reaches 80C.

I remember getting MCE errors in logs when I OCd the system with stock CPU voltage, increasing it by +0.025 (offset) or so, problems went away. I also set SOC voltage to 1.0V (fixed). Default behaviour is when I OC the memory, MB sets SOC to 1.1, but I think it's too high and I found that 1.0 is fine, default is 0.9 I believe.

MCE freeze/reboot problem is tricky, because there is nothing written in logs and system just randomly reboots or freezes, like You I got reboot while browsing with little load.

So, try increasing CPU voltage a little and SOC as well. Try with each of memory modules separate. Try taking out BIOS batterry as well to reset everything.

Check https://community.amd.com/thread/215773 as well, You may need to RMA the chip if nothing helps.

BR, Kirurgs


-------------
CPU: Ryzen 5600X
MB: Fatal1ty X370 Gaming K4 (BIOS 7.03)
RAM: CMK16GX4M2B3200C16, works with 3200 by default on 7.03 bios (previously I could not go higher than 2933)


Posted By: HxwRXP
Date Posted: 08 Sep 2017 at 6:01pm
" rel="nofollow - Hi Kirurgs,

Thanks for your tips.
I have further tried a few things..
1. I have done a memtest86 and only waited for 5 and a half hours which reached about 50% with no errors so i stopped it.
2. I missed the setting to disable "Global C-States" previously, so i went ahead and disabled that as well.  But i havent tested leaving the PC to idle in Ubuntu after that but i will do that today.
3. I decided to overclock the system from BIOS using Pstate settings:
Pstate0 - 98 - 8 - 20  (3800MHz, 1.35v)
Pstate1 - 98 - 8 - 20  (3800MHz, 1.35v)
Pstate2 - i left on default values.
Pstate3 onwards i disabled.

This eventually worked, but what i had to do was Disable global C-States first then power cycle the PC and then edit the Pstate values.
I then installed a fresh Win10 installation, of which i checked the Cinebench score to see also if my system is stable. So the result was good with 1600cn score. and it finished successfully. i checked in hardware monitor that the system was overclocked at 3800MHz from default 3600MHz.

I went back into Ubuntu and i was working fine for 2 or 3 hours. I then decided to restart so that i can just view something in BIOS and as i was there viewing the hdware monitor. Suddenly the PC switches off while I was in BIOS.  Now i dont know why that happened, because i didnt change anything. Everything was working well at 3800MHz.  I tried switching it back on but when it reached BIOS, it switched off again. This happened a few times. So i eventually resetted the BIOS from the CMOS button.

I tried to redo all the values to 3800MHz and after a power cycle reaching the BIOS the PC kept switching off. Like it was overheating or something? im not sure why before it worked and now i cant even overclock.
It seems as if the Fans work over-time in BIOS but my H100i V2 is maybe not increasing its power when in BIOS?

My temps in BIOS the CPU shows always 70-75 degrees Celcius range.
However in Windows the HDW Monitor showed 40 degrees on idle. They dont fluctuate too much. Only when doing Cinebench did it go to 60 or 70 degrees in HDW Monitor.

Anyway, I dont really mind if overclocking doesnt work - i just want a stable system, but i just wanted to tell you what i did so you can maybe have an idea.
if you can please explain to me where exactly i need to change the voltage so that i can do what you did to have a stable system, id really appreciate that.  Im a bit of a noob in overclocking so im not sure exactly which values i need to change.  Where is the SOC? is that the Vcore? or VDD SoC ?

My ram values i did not change at all. i left it at default. I think you only need to change VDD SoC if you change RAM values.

Also, i did another test, I used the kill-ryzen script to check if my CPU is affected by the segfault problem.
I left it for about half an hour and i did get segfaults in bash. So does this mean im one of those users that are genuinely affected by this problem?
Also and most importantly, does the segfault problem have anything to do with my PC randomly switching off or freezing while idling for about 6 hours in Ubuntu?  Because i believe you said you had the same issue.

Thanks in advance for your help


Posted By: Kirurgs
Date Posted: 09 Sep 2017 at 5:16am
Hi!

Ehh, it's rather complicated in Your case I think. See, MCE reboot/freeze is completely different from segfaults, at least that's what I found out and what seems to be the case described in the forum link I posted in my previous posts.

Regarding "reboot in BIOS", that's a new thing for me. It kinda suggests You may have thermal problems or MCE "works" in BIOS as well :)
See in BIOS my CPU have always much higher temps than in OS (which happens to be Ubuntu). Like, when idling even with my overclock (from 3.0 -> 3.6) it's about 40C, but in BIOS it's always in high 50s, meaning 55-59. Why is that, I do not know, but it is.
By default there is this temp protection switched on (which I don't recommend disabling) which may interfere, but 70+ in BIOS, that's high, maybe it's ok for 1800x, but seems quite high.

Also, I have default cooler which came with my CPU, paste already pre-applied, You, obviously, don't. It might be a good idea to check how fast Your fans are going and what's the temp under load. One of the easiest You can do, just clean the thermal paste off and put new one on, I use grizzly paste, I apply the paste in very thin amount all over the CPU (this is what grizzly recommends) and in addition I put pea sized point in the middle, that works fine. I think that made my CPU 5 degrees cooler than default paste.
Set fans to default setting (by default they are in default), but just make sure they are.

I think my overclock is considered "mild", I overclock only pstate0, the rest are set to auto, so it works like this: highest is 3600, next is 2550 (or so, I don't remember exactly and I have no access to Ryzen at the moment to check) and lowest is 1550 (or so), which, to me, seems reasonable, I only want the max to be overclocked not the rest.

So, reapply thermal paste, run stress -c 16 (command in ubuntu) or prime95, monitor Your temps. Monitor syslog, if there are HW errors, it may have smth to do with voltage to be too low. If kill-ryzen is giving You segfaults but stress or prime95 not, You may have to RMA Your CPU. I had run memtest86 for a long time with my old (faulty) CPU, no issues, it was kill-ryzen which segfaulted. When voltage was low on my old overclocked CPU, I was getting MCE errors in syslog, increasing voltage helped (as expected, I was stupid to hope overclock @ default voltages to work :))
BUT, if You're running all default and kill-ryzen gives You segfaults, just RMA Your CPU.

Br, Kirurgs


-------------
CPU: Ryzen 5600X
MB: Fatal1ty X370 Gaming K4 (BIOS 7.03)
RAM: CMK16GX4M2B3200C16, works with 3200 by default on 7.03 bios (previously I could not go higher than 2933)


Posted By: MisterJ
Date Posted: 27 Sep 2017 at 4:32am
" rel="nofollow - HxwRXP, are you still without a solution?  I have not kept up with your thread.  I cannot answer questions you have about Linux and Ryzen.  I do know that the Linux kernel has been modified for new AMD processors.  Please be sure you have the latest.  I also know that AMD was required to release a HW change (processor, I think) for the segfault problem.  I strongly recommend that you open a AMD support ticket and have a discussion with them.  It seems like AMD is making the updated HW available rather sparingly.  Good luck and please keep us informed here.  Thanks enjoy, John.


-------------
Fat1 X399 Pro Gaming, TR 1950X, RAID0 3xSamsung SSD 960 EVO, G.SKILL FlareX F4-3200C14Q-32GFX, Win 10 x64 Pro, Enermx Platimax 850, Enermx Liqtech TR4 CPU Cooler, Radeon RX580, BIOS 2.00, 2xHDDs WD


Posted By: MisterJ
Date Posted: 27 Sep 2017 at 4:39am
Originally posted by MisterJ MisterJ wrote:

" rel="nofollow - HxwRXP, are you still without a solution?  I have not kept up with your thread.  I cannot answer questions you have about Linux and Ryzen.  I do know that the Linux kernel has been modified for new AMD processors.  Please be sure you have the latest.  I also know that AMD was required to release a HW change (processor, I think) for the segfault problem.  I strongly recommend that you open a AMD support ticket and have a discussion with them.  It seems like AMD is making the updated HW available rather sparingly.  Good luck and please keep us informed here.  Thanks enjoy, John.



Kirurgs, Ryzen/Threadripper have a rather well know CPU temperature reporting oddity.  At one time AMD added 20 C to the CPU readings and programs like AIDA64 subtracted 20 C.  In my  system the BIOS always shows about +20 C in the BIOS.  I wish AMD would clearly tell us what the various CPU temperatures mean and how they are used internally and what we should really believe.


-------------
Fat1 X399 Pro Gaming, TR 1950X, RAID0 3xSamsung SSD 960 EVO, G.SKILL FlareX F4-3200C14Q-32GFX, Win 10 x64 Pro, Enermx Platimax 850, Enermx Liqtech TR4 CPU Cooler, Radeon RX580, BIOS 2.00, 2xHDDs WD


Posted By: HxwRXP
Date Posted: 27 Sep 2017 at 7:13pm
Hi MisterJ !

Sorry for the late response. Ive been waiting to make sure first before i posted my long term results.

So i can safely agree with Kirurgs that the segfault CPU issue has NOTHING to do with the random reboots or system freezes.

Although my CPU is definitely one of the first ones that was affected by the segfault bug, i wont be RMA'ing it. because its not worth my trouble and time. i think i might be able to live with it for now.

On another note, the system instability random reboots or freezes has finally stopped.
What i did was:
1. i reset the BIOS to defaults after having played around with over-clocks.
2. I disabled all C-states and Global C-States.
3. I disabled Cool n' Quiet
4. BIOS version 3.10 is the one i am using, even though 3.20 came out on 13 September (im too scared to upgrade)
5. Ubuntu Kernel 4.4.0-96  (was also stable on 4.4.0-93)

Right now, my system is stable and hasn't been freezing or rebooting.  I have a strong feeling the f**king Global C-States was the cause.

If i come across any other unexpected problem il be sure to update this post. Till then, thanks for your guidance!


Posted By: MisterJ
Date Posted: 27 Sep 2017 at 11:39pm
Thanks much, HxwRXP for your update.  It is great you are stable now.  I am not into OCing so know almost nothing about C-States. 
Some have had trouble with the BIOS 3.20 update and I have avoided it (3.10 on my board).  As near as I can determine it fixes only obscure problems (not defined).  The most important thing to remember is to use the Instant Flash in the BIOS.  Do not use the Windows method - Don't understand why ASRock has started providing that!  Enjoy, John.


-------------
Fat1 X399 Pro Gaming, TR 1950X, RAID0 3xSamsung SSD 960 EVO, G.SKILL FlareX F4-3200C14Q-32GFX, Win 10 x64 Pro, Enermx Platimax 850, Enermx Liqtech TR4 CPU Cooler, Radeon RX580, BIOS 2.00, 2xHDDs WD



Print Page | Close Window

Forum Software by Web Wiz Forums® version 12.04 - http://www.webwizforums.com
Copyright ©2001-2021 Web Wiz Ltd. - https://www.webwiz.net