ASRock.com Homepage
Forum Home Forum Home > Technical Support > AMD Motherboards
  New Posts New Posts RSS Feed - Taichi X370 with Ubuntu idle lock ups/idle freeze
  FAQ FAQ  Forum Search Search  Events   Register Register  Login Login

Taichi X370 with Ubuntu idle lock ups/idle freeze

 Post Reply Post Reply Page  12>
Author
Message Reverse Sort Order
MisterJ View Drop Down
Senior Member
Senior Member


Joined: 19 Apr 2017
Status: Offline
Points: 1097
Post Options Post Options   Thanks (0) Thanks(0)   Quote MisterJ Quote  Post ReplyReply Direct Link To This Post Topic: Taichi X370 with Ubuntu idle lock ups/idle freeze
    Posted: 27 Sep 2017 at 11:39pm
Thanks much, HxwRXP for your update.  It is great you are stable now.  I am not into OCing so know almost nothing about C-States. 
Some have had trouble with the BIOS 3.20 update and I have avoided it (3.10 on my board).  As near as I can determine it fixes only obscure problems (not defined).  The most important thing to remember is to use the Instant Flash in the BIOS.  Do not use the Windows method - Don't understand why ASRock has started providing that!  Enjoy, John.
Fat1 X399 Pro Gaming, TR 1950X, RAID0 3xSamsung SSD 960 EVO, G.SKILL FlareX F4-3200C14Q-32GFX, Win 10 x64 Pro, Enermx Platimax 850, Enermx Liqtech TR4 CPU Cooler, Radeon RX580, BIOS 2.00, 2xHDDs WD
Back to Top
HxwRXP View Drop Down
Newbie
Newbie


Joined: 01 Sep 2017
Status: Offline
Points: 8
Post Options Post Options   Thanks (0) Thanks(0)   Quote HxwRXP Quote  Post ReplyReply Direct Link To This Post Posted: 27 Sep 2017 at 7:13pm
Hi MisterJ !

Sorry for the late response. Ive been waiting to make sure first before i posted my long term results.

So i can safely agree with Kirurgs that the segfault CPU issue has NOTHING to do with the random reboots or system freezes.

Although my CPU is definitely one of the first ones that was affected by the segfault bug, i wont be RMA'ing it. because its not worth my trouble and time. i think i might be able to live with it for now.

On another note, the system instability random reboots or freezes has finally stopped.
What i did was:
1. i reset the BIOS to defaults after having played around with over-clocks.
2. I disabled all C-states and Global C-States.
3. I disabled Cool n' Quiet
4. BIOS version 3.10 is the one i am using, even though 3.20 came out on 13 September (im too scared to upgrade)
5. Ubuntu Kernel 4.4.0-96  (was also stable on 4.4.0-93)

Right now, my system is stable and hasn't been freezing or rebooting.  I have a strong feeling the f**king Global C-States was the cause.

If i come across any other unexpected problem il be sure to update this post. Till then, thanks for your guidance!
Back to Top
MisterJ View Drop Down
Senior Member
Senior Member


Joined: 19 Apr 2017
Status: Offline
Points: 1097
Post Options Post Options   Thanks (0) Thanks(0)   Quote MisterJ Quote  Post ReplyReply Direct Link To This Post Posted: 27 Sep 2017 at 4:39am
Originally posted by MisterJ MisterJ wrote:

HxwRXP, are you still without a solution?  I have not kept up with your thread.  I cannot answer questions you have about Linux and Ryzen.  I do know that the Linux kernel has been modified for new AMD processors.  Please be sure you have the latest.  I also know that AMD was required to release a HW change (processor, I think) for the segfault problem.  I strongly recommend that you open a AMD support ticket and have a discussion with them.  It seems like AMD is making the updated HW available rather sparingly.  Good luck and please keep us informed here.  Thanks enjoy, John.



Kirurgs, Ryzen/Threadripper have a rather well know CPU temperature reporting oddity.  At one time AMD added 20 C to the CPU readings and programs like AIDA64 subtracted 20 C.  In my  system the BIOS always shows about +20 C in the BIOS.  I wish AMD would clearly tell us what the various CPU temperatures mean and how they are used internally and what we should really believe.
Fat1 X399 Pro Gaming, TR 1950X, RAID0 3xSamsung SSD 960 EVO, G.SKILL FlareX F4-3200C14Q-32GFX, Win 10 x64 Pro, Enermx Platimax 850, Enermx Liqtech TR4 CPU Cooler, Radeon RX580, BIOS 2.00, 2xHDDs WD
Back to Top
MisterJ View Drop Down
Senior Member
Senior Member


Joined: 19 Apr 2017
Status: Offline
Points: 1097
Post Options Post Options   Thanks (0) Thanks(0)   Quote MisterJ Quote  Post ReplyReply Direct Link To This Post Posted: 27 Sep 2017 at 4:32am
HxwRXP, are you still without a solution?  I have not kept up with your thread.  I cannot answer questions you have about Linux and Ryzen.  I do know that the Linux kernel has been modified for new AMD processors.  Please be sure you have the latest.  I also know that AMD was required to release a HW change (processor, I think) for the segfault problem.  I strongly recommend that you open a AMD support ticket and have a discussion with them.  It seems like AMD is making the updated HW available rather sparingly.  Good luck and please keep us informed here.  Thanks enjoy, John.
Fat1 X399 Pro Gaming, TR 1950X, RAID0 3xSamsung SSD 960 EVO, G.SKILL FlareX F4-3200C14Q-32GFX, Win 10 x64 Pro, Enermx Platimax 850, Enermx Liqtech TR4 CPU Cooler, Radeon RX580, BIOS 2.00, 2xHDDs WD
Back to Top
Kirurgs View Drop Down
Newbie
Newbie


Joined: 09 Jun 2017
Status: Offline
Points: 136
Post Options Post Options   Thanks (0) Thanks(0)   Quote Kirurgs Quote  Post ReplyReply Direct Link To This Post Posted: 09 Sep 2017 at 5:16am
Hi!

Ehh, it's rather complicated in Your case I think. See, MCE reboot/freeze is completely different from segfaults, at least that's what I found out and what seems to be the case described in the forum link I posted in my previous posts.

Regarding "reboot in BIOS", that's a new thing for me. It kinda suggests You may have thermal problems or MCE "works" in BIOS as well :)
See in BIOS my CPU have always much higher temps than in OS (which happens to be Ubuntu). Like, when idling even with my overclock (from 3.0 -> 3.6) it's about 40C, but in BIOS it's always in high 50s, meaning 55-59. Why is that, I do not know, but it is.
By default there is this temp protection switched on (which I don't recommend disabling) which may interfere, but 70+ in BIOS, that's high, maybe it's ok for 1800x, but seems quite high.

Also, I have default cooler which came with my CPU, paste already pre-applied, You, obviously, don't. It might be a good idea to check how fast Your fans are going and what's the temp under load. One of the easiest You can do, just clean the thermal paste off and put new one on, I use grizzly paste, I apply the paste in very thin amount all over the CPU (this is what grizzly recommends) and in addition I put pea sized point in the middle, that works fine. I think that made my CPU 5 degrees cooler than default paste.
Set fans to default setting (by default they are in default), but just make sure they are.

I think my overclock is considered "mild", I overclock only pstate0, the rest are set to auto, so it works like this: highest is 3600, next is 2550 (or so, I don't remember exactly and I have no access to Ryzen at the moment to check) and lowest is 1550 (or so), which, to me, seems reasonable, I only want the max to be overclocked not the rest.

So, reapply thermal paste, run stress -c 16 (command in ubuntu) or prime95, monitor Your temps. Monitor syslog, if there are HW errors, it may have smth to do with voltage to be too low. If kill-ryzen is giving You segfaults but stress or prime95 not, You may have to RMA Your CPU. I had run memtest86 for a long time with my old (faulty) CPU, no issues, it was kill-ryzen which segfaulted. When voltage was low on my old overclocked CPU, I was getting MCE errors in syslog, increasing voltage helped (as expected, I was stupid to hope overclock @ default voltages to work :))
BUT, if You're running all default and kill-ryzen gives You segfaults, just RMA Your CPU.

Br, Kirurgs
CPU: Ryzen 5600X
MB: Fatal1ty X370 Gaming K4 (BIOS 7.03)
RAM: CMK16GX4M2B3200C16, works with 3200 by default on 7.03 bios (previously I could not go higher than 2933)
Back to Top
HxwRXP View Drop Down
Newbie
Newbie


Joined: 01 Sep 2017
Status: Offline
Points: 8
Post Options Post Options   Thanks (0) Thanks(0)   Quote HxwRXP Quote  Post ReplyReply Direct Link To This Post Posted: 08 Sep 2017 at 6:01pm
Hi Kirurgs,

Thanks for your tips.
I have further tried a few things..
1. I have done a memtest86 and only waited for 5 and a half hours which reached about 50% with no errors so i stopped it.
2. I missed the setting to disable "Global C-States" previously, so i went ahead and disabled that as well.  But i havent tested leaving the PC to idle in Ubuntu after that but i will do that today.
3. I decided to overclock the system from BIOS using Pstate settings:
Pstate0 - 98 - 8 - 20  (3800MHz, 1.35v)
Pstate1 - 98 - 8 - 20  (3800MHz, 1.35v)
Pstate2 - i left on default values.
Pstate3 onwards i disabled.

This eventually worked, but what i had to do was Disable global C-States first then power cycle the PC and then edit the Pstate values.
I then installed a fresh Win10 installation, of which i checked the Cinebench score to see also if my system is stable. So the result was good with 1600cn score. and it finished successfully. i checked in hardware monitor that the system was overclocked at 3800MHz from default 3600MHz.

I went back into Ubuntu and i was working fine for 2 or 3 hours. I then decided to restart so that i can just view something in BIOS and as i was there viewing the hdware monitor. Suddenly the PC switches off while I was in BIOS.  Now i dont know why that happened, because i didnt change anything. Everything was working well at 3800MHz.  I tried switching it back on but when it reached BIOS, it switched off again. This happened a few times. So i eventually resetted the BIOS from the CMOS button.

I tried to redo all the values to 3800MHz and after a power cycle reaching the BIOS the PC kept switching off. Like it was overheating or something? im not sure why before it worked and now i cant even overclock.
It seems as if the Fans work over-time in BIOS but my H100i V2 is maybe not increasing its power when in BIOS?

My temps in BIOS the CPU shows always 70-75 degrees Celcius range.
However in Windows the HDW Monitor showed 40 degrees on idle. They dont fluctuate too much. Only when doing Cinebench did it go to 60 or 70 degrees in HDW Monitor.

Anyway, I dont really mind if overclocking doesnt work - i just want a stable system, but i just wanted to tell you what i did so you can maybe have an idea.
if you can please explain to me where exactly i need to change the voltage so that i can do what you did to have a stable system, id really appreciate that.  Im a bit of a noob in overclocking so im not sure exactly which values i need to change.  Where is the SOC? is that the Vcore? or VDD SoC ?

My ram values i did not change at all. i left it at default. I think you only need to change VDD SoC if you change RAM values.

Also, i did another test, I used the kill-ryzen script to check if my CPU is affected by the segfault problem.
I left it for about half an hour and i did get segfaults in bash. So does this mean im one of those users that are genuinely affected by this problem?
Also and most importantly, does the segfault problem have anything to do with my PC randomly switching off or freezing while idling for about 6 hours in Ubuntu?  Because i believe you said you had the same issue.

Thanks in advance for your help
Back to Top
Kirurgs View Drop Down
Newbie
Newbie


Joined: 09 Jun 2017
Status: Offline
Points: 136
Post Options Post Options   Thanks (0) Thanks(0)   Quote Kirurgs Quote  Post ReplyReply Direct Link To This Post Posted: 07 Sep 2017 at 6:22pm
HxwRXP,

I have "famous" x370 Gaming K4 :)
It really seems that either that is MCE bug or temperatures, what are temperatures, do they fluctuate a lot or are pretty stable and reasonable?
Mine are ~ 40C when OC and while idling, when there is a moderate load - up to 60C, when prime95 or stress, it reaches 80C.

I remember getting MCE errors in logs when I OCd the system with stock CPU voltage, increasing it by +0.025 (offset) or so, problems went away. I also set SOC voltage to 1.0V (fixed). Default behaviour is when I OC the memory, MB sets SOC to 1.1, but I think it's too high and I found that 1.0 is fine, default is 0.9 I believe.

MCE freeze/reboot problem is tricky, because there is nothing written in logs and system just randomly reboots or freezes, like You I got reboot while browsing with little load.

So, try increasing CPU voltage a little and SOC as well. Try with each of memory modules separate. Try taking out BIOS batterry as well to reset everything.

Check https://community.amd.com/thread/215773 as well, You may need to RMA the chip if nothing helps.

BR, Kirurgs


Edited by Kirurgs - 07 Sep 2017 at 6:58pm
CPU: Ryzen 5600X
MB: Fatal1ty X370 Gaming K4 (BIOS 7.03)
RAM: CMK16GX4M2B3200C16, works with 3200 by default on 7.03 bios (previously I could not go higher than 2933)
Back to Top
HxwRXP View Drop Down
Newbie
Newbie


Joined: 01 Sep 2017
Status: Offline
Points: 8
Post Options Post Options   Thanks (0) Thanks(0)   Quote HxwRXP Quote  Post ReplyReply Direct Link To This Post Posted: 07 Sep 2017 at 5:52pm
Hi @Kirurgs,

Thank you for your input!  Actually my problem is still not solved. I continue to have the freezes and YES you are right, my PC also randomly reboots every day if i have had my PC on idle OR. .like the other day i was busy simply browsing, hardly using any CPU and my PC just decided to reboot.

Im so upset about this because i have paid so much money on these parts. I really hope i find the solution.

So far i have tried disabling SMT and UEFI in the BIOS, and i had another freeze today.
I have also tried disabling screen lock like you said and i got another freeze.

I havent tested the RAM yet. but when i was downloading the blockchain, it did it so quick that i didnt have a problem with RAM neither did the PC reboot under load.  The many times i have experienced the random reboots, my PC has always been on idle.  However i have NOT seen any error messages in any logs.  The mcelog is not supported by AMD.  I get this error:

mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor.  Please use the edac_mce_amd module instead.  So i dont know what to use to see mce errors.. any idea?

I have a feeling it could be the voltage like you did. I need to try that out and overclock my CPU and RAM.

What motherboard do you have?
Were you getting error message anywhere?

I have few times seen on the Dr. Debug display that the error is 00, which means i must reseat the CPU, thats what the mobo manual says.
Back to Top
Kirurgs View Drop Down
Newbie
Newbie


Joined: 09 Jun 2017
Status: Offline
Points: 136
Post Options Post Options   Thanks (0) Thanks(0)   Quote Kirurgs Quote  Post ReplyReply Direct Link To This Post Posted: 06 Sep 2017 at 10:24pm
Hi!

I'm actually full time Linux  user :)
For installation, surprise surprise, You do not need to reinstall the Linux when HW changes, it loads it's own built-in auto-detected drivers when computer starts, e.g. You can install Linux on flash drive on a desktop system with AMD and to to work and put that flash to intel system and all will work just fine.

What can I advise You - disable screen lock / off in "Brightness & Lock", it maybe related to that.

There is segfault issue, which is easy triggered in Linux, not that easy on windows, but that is due to high load.
There is this MCE freeze/reboot problem, which is strange one, maybe You experience that. I have experienced it, but it just went away after BIOS updates or voltage increase (I did run overclock on stock voltage, it apparently was not enough, so I slightly increased voltage). Can't say for sure what cured it.

Also, try resetting BIOS to defaults / load defaults or better disconnect all cables, CMOS battery out, wait 5 minutes, put in, reattach all the stuff, do not change anything in BIOS just boot up the system, disable screen lock / off and wait for problem to reoccur.

Wardog (forum moderator), which has more contact with AMD than we, suggests that before each BIOS update, one has to clear CMOS for the best results. Try that.

Also, pull out one of memory modules and try repeat the problem.

As for me, I had segfault issue, RMAd the CPU, no issues so far.

It may sound a lot to do, but the problem maybe in a lot of places, You need to just go and do one thing at a time, to rule it out.

BR, Kirurgs


Edited by Kirurgs - 06 Sep 2017 at 10:26pm
Back to Top
MisterJ View Drop Down
Senior Member
Senior Member


Joined: 19 Apr 2017
Status: Offline
Points: 1097
Post Options Post Options   Thanks (0) Thanks(0)   Quote MisterJ Quote  Post ReplyReply Direct Link To This Post Posted: 02 Sep 2017 at 11:36pm
HxwRXP, thanks for the update.  I have advised several W10 users not to do what you have done.  I know almost nothing about Linux, but using an OS from Intel on an AMD platform is asking for problems just as you have assumed.  I know Windows does install code specific to the processor.  A fresh OS install should always be done on a new system even if the OS and processor are not changing.  Thanks and enjoy, John.
Fat1 X399 Pro Gaming, TR 1950X, RAID0 3xSamsung SSD 960 EVO, G.SKILL FlareX F4-3200C14Q-32GFX, Win 10 x64 Pro, Enermx Platimax 850, Enermx Liqtech TR4 CPU Cooler, Radeon RX580, BIOS 2.00, 2xHDDs WD
Back to Top
 Post Reply Post Reply Page  12>
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 12.04
Copyright ©2001-2021 Web Wiz Ltd.

This page was generated in 0.063 seconds.