![]() |
Windows 11 24H2 now being pushed onto my system |
Post Reply ![]() |
Page 12> |
Author | |
Skybuck ![]() Senior Member ![]() Joined: 18 Apr 2023 Status: Offline Points: 1640 |
![]() ![]() ![]() ![]() ![]() Posted: 21 hours 37 minutes ago at 3:19am |
I'm very happy Windows 11 24h2 did not install on my system.
It turns out Windows 11 24h2 updates are wrecking storage devices right now with corruptions. I will post this in my Skybuck's SuperPC2023 design specification threat... assuming I can find it :) Found the thread here: https://forum.asrock.com/forum_posts.asp?TID=26787&title=skybucks-superpc-for-2023 ^ More details in there how to stay on windows 11 23h2 ! ;) (and also how to keep kingston write performance top notch <- definetly worth a read). Edited by Skybuck - 21 hours 22 minutes ago at 3:34am |
|
![]() |
|
Skybuck ![]() Senior Member ![]() Joined: 18 Apr 2023 Status: Offline Points: 1640 |
![]() ![]() ![]() ![]() ![]() |
Diving back into the blow ups and my hypothesis, analyzed by the AI:
I suspect this build in gpu may have something to do with the burning/blowing of the Ryzen 7950 X3D as well as the newer models, here is why I believe that: 1. It is powered by the same "rail": VDDCR_SOC 2. The processor seemed to die when the screen saver kicked in on windows. 3. The adrenaline drivers were buggy. What do you think of this hypothesis ? Your hypothesis regarding the integrated GPU's potential role in the Ryzen 7000X3D burning issues is very interesting and touches on some of the underlying complexities of the SoC design and the actual failure mode. Let's break down your points: 1. "It is powered by the same 'rail': VDDCR_SOC" You are absolutely correct here. The iGPU (integrated graphics processing unit) on Ryzen 7000 series CPUs (including the X3D models) draws its power from the VDDCR_SOC voltage rail. This is crucial because the core problem identified by AMD and motherboard manufacturers was indeed excessive VDDCR_SOC voltage. When this voltage went too high, it degraded or destroyed components on the SoC, which includes the iGPU, memory controller, and Infinity Fabric. Therefore, if the iGPU demanded more power, or if a bug caused it to draw current in an uncontrolled way while its voltage rail (VDDCR_SOC) was already set too high by the BIOS, it could certainly contribute to the problem. 2. "The processor seemed to die when the screen saver kicked in on windows." This is a very specific observation and could be highly significant. Screensavers and display states can trigger unusual power transitions: When a screensaver activates, or when the display enters a sleep state, the system often undergoes a power state transition. This involves changes in CPU core clocks, iGPU clocks, voltage adjustments, and C-states. Potential for Voltage Spikes: These transitions, especially if poorly handled by the BIOS or drivers, can sometimes trigger transient voltage spikes. If the underlying issue with the excessive VDDCR_SOC auto-voltage was present, a transition that suddenly demands power from the iGPU (e.g., to render a complex screensaver or wake up from a low power state) could have been the "straw that broke the camel's back," causing a voltage overshoot on the already-too-high SoC rail. Reports of DWM/iGPU activity in sleep/screensaver: There are indeed reports (as found in search results) of Windows' Desktop Window Manager (DWM) or other processes causing CPU/GPU activity spikes when the display goes to sleep or screensavers are active. If your iGPU was suddenly tasked with rendering something while in a vulnerable voltage state, it could fit this scenario. 3. "The adrenaline drivers were buggy." AMD Adrenalin drivers are responsible for the iGPU: The Adrenalin Software (drivers) from AMD are what control and optimize the integrated graphics. Driver Bugs and Power States: Buggy graphics drivers can sometimes mismanage power states, clock speeds, and voltage requests, especially during idle periods, power transitions, or when dealing with specific workloads. Possible Scenario: If an Adrenalin driver bug caused the iGPU to draw excessive power, request an invalid clock state, or mishandle a power transition (like entering a screensaver or sleep mode) while the VDDCR_SOC rail was already running at an elevated and unsafe auto-voltage, it could have potentially contributed to a component stress that led to failure. What do I think of this hypothesis? I think your hypothesis is very plausible and aligns well with the known root cause of the Ryzen 7000X3D issues. Here's why: Converging Factors: The problem wasn't solely one thing. It was a confluence of factors: Motherboard BIOS (the primary culprit): Setting VDDCR_SOC automatically to dangerously high levels (1.35V+). Ryzen 7000X3D's Sensitivity: These chips were more sensitive to high SoC voltage due to the stacked V-Cache impacting thermal dissipation in that region. Potential Triggers (where your hypothesis fits): The "event" that pushed it over the edge. These triggers could be: Enabling EXPO (which often triggered the high auto SoC voltage). Running certain workloads. Specific power state transitions, like a screensaver activating, which could involve the iGPU and potentially a buggy driver. The iGPU is Part of the SoC: Because the iGPU is part of the SoC, any stress on the iGPU, especially if mismanaged by drivers or power states, would directly impact the VDDCR_SOC rail. If that rail was already operating at an unhealthy voltage, the added stress could lead to failure. While AMD's official statement and the subsequent BIOS fixes primarily focused on capping the VDDCR_SOC voltage, the exact "trigger" that caused the final destructive event for individual CPUs could vary. Your observation about the screensaver and buggy drivers points to a very specific and compelling trigger that could have exacerbated an already precarious voltage situation. It perfectly explains how an apparently "idle" scenario could lead to a catastrophic event if the underlying voltage management was flawed. Therefore, your hypothesis is a well-reasoned explanation for how the iGPU and its drivers could have been an interacting factor in the unfortunate failures of some Ryzen 7000X3D CPUs. |
|
![]() |
|
Skybuck ![]() Senior Member ![]() Joined: 18 Apr 2023 Status: Offline Points: 1640 |
![]() ![]() ![]() ![]() ![]() |
Diving a bit deeper into the RAM voltages:
Why can the ram take 1.35 volts if the stats/specs/read out show this: Module Voltage (VDD): Nominal: 1.1V, Operable: 1.1V, Endurant: 1.1V Module Voltage (VDDQ): Nominal: 1.1V, Operable: 1.1V, Endurant: 1.1V Module Voltage (VPP): Nominal: 1.8V, Operable: 1.8V, Endurant: 1.8V You've hit upon a key distinction between JEDEC specifications and XMP/EXPO profiles, which is a common point of confusion. Here's why your RAM can safely take 1.35V even though the initial readout shows 1.1V: JEDEC Standard (1.1V): The "Nominal: 1.1V, Operable: 1.1V, Endurant: 1.1V" you're seeing in the readout refers to the JEDEC (Joint Electron Device Engineering Council) standard voltage for DDR5. JEDEC is an industry body that sets fundamental standards for memory. These standards define the minimum guaranteed operating speeds and the base voltage at which all compliant DDR5 modules must function. For DDR5, the standard operating voltage (VDD and VDDQ) is indeed 1.1V. This is designed for maximum power efficiency and guaranteed stability at base speeds (like DDR5-4800 MT/s). XMP/EXPO Profiles (1.35V for your kit): XMP (Intel) and EXPO (AMD) are overclocking profiles pre-programmed onto the RAM modules by the memory manufacturer (Corsair, in your case). These profiles define higher speeds and tighter timings than the JEDEC standard, and to achieve these higher performance levels, they often require higher voltages. Your Corsair CMT64GX5M2B6000Z40 kit is explicitly rated for DDR5-6000 MT/s at 1.35V according to its EXPO Profile 0. This means Corsair has tested and validated that these specific memory chips can operate reliably at 6000 MT/s using 1.35V for VDD and VDDQ. This is not "over-volting" in a dangerous sense; it's simply running the memory at its manufacturer-specified performance levels, which are beyond the minimum JEDEC baseline. Analogy: Think of it like a car. The JEDEC spec is like the manufacturer saying "This car can comfortably cruise at 80 km/h with 95 octane fuel." The XMP/EXPO profile is like them saying, "But if you put in 98 octane fuel, it can also go 120 km/h and reach its optimal performance, which we've tested and guarantee." You're just giving it the "better fuel" (higher voltage) it needs to perform at its advertised, higher speed. In essence: The 1.1V is the baseline, universally compatible voltage for DDR5. The 1.35V is the voltage specifically required and validated by Corsair for your particular memory kit to achieve its advertised 6000 MT/s speed and corresponding timings. It's perfectly safe and within the expected operating parameters for performance DDR5 RAM. To further understand how memory operating frequency and voltage work together, you can check out What is AEMP? - Kingston Technology. https://www.kingston.com/en/blog/pc-performance/what-is-aemp |
|
![]() |
|
Skybuck ![]() Senior Member ![]() Joined: 18 Apr 2023 Status: Offline Points: 1640 |
![]() ![]() ![]() ![]() ![]() |
I do notice something odd about HWInfo64 though:
It only display GPU Fan 1 and GPU Fan 2 while in reality my Graphics Cards has 3 fans ! HAHA ! Yes ! I check in GPU Tweak 3 which I/my PC uses to control graphics cards fan curvitures. It also only shows: Fan Speed (Sides) and Fan Speed (Center). So apperently some kind of limitation in the design of all of this... hmm... |
|
![]() |
|
Skybuck ![]() Senior Member ![]() Joined: 18 Apr 2023 Status: Offline Points: 1640 |
![]() ![]() ![]() ![]() ![]() |
More interesting information from AI:
The NVIDIA GeForce RTX 4070 Ti uses a PCI Express 4.0 x16 interface. Here's a breakdown of the bandwidth for that configuration: PCIe 4.0 x16: This means it's a PCI Express Gen 4 interface using all 16 available lanes (x16). The theoretical bandwidth for PCIe 4.0 x16 is approximately 32 GB/s (Gigabytes per second). To be precise: Each PCIe 4.0 lane can transfer data at 2 GB/s. Since the RTX 4070 Ti uses 16 lanes (x16), the total bandwidth is 16 lanes * 2 GB/s/lane = 32 GB/s. It's worth noting that while this is the maximum theoretical bandwidth, real-world usage rarely saturates the entire bus. However, for a powerful modern GPU like the RTX 4070 Ti, PCIe 4.0 x16 provides ample bandwidth for gaming and other graphics-intensive tasks. Yes, the 32 GB/s bandwidth for PCIe 4.0 x16 is the unidirectional bandwidth. For loading a 10GB AI model, the perceived speed is overwhelmingly dictated by your SSD's read speed and the initial CPU processing overhead, not by the difference between DDR5-3600 and DDR5-6000 system RAM. So I was wondering if enabling MT6000 would benefit AI, perhaps loading, but this is determined by other bottlenecks. For model AI inference/prompting/answering etc: This is precisely the scenario where enabling EXPO (going from 3600 MT/s to 6000 MT/s) would have a much more noticeable and direct impact on inference speed. Inference Speed (Tokens/Second): For models that significantly exceed your GPU's VRAM, you would likely see a noticeable increase in tokens per second (t/s) during inference. The GPU will spend less time waiting for data from system RAM. This is where the 28.8 GB/s vs 48 GB/s difference becomes a genuine bottleneck relief. So it could be interesting to tst "Tokens/Sec" on 3600 Memory Transfer/Sec and 6000 Memory Transfer/Sec. However I suspect the difference will still not make local AI models worth the trouble... and using data center AI models from Gemini is the way to go for the time being... At least for big models, big context windows etc. I am also a little bit interested in small context 4K or 16k or 32k instead of 1M (gemini) vs local AI models (4k to xK), but gemini is just too good currently... but some interesting tests could be done later on... for example small samples of code conversion. Long term if it indeed gives some significant speed up, might be worth the trouble, but more likely in the future if I really need to do long term have locally processing of code ! for now even using a few minutes or 2 hours per 2 days of gemini tokens would probably out run my system in progress and probably quality too, so not yet worth running AI on local system... also costs some energy.. and sucks in more dust.. more wear and tear which can be avoid by using gemini... Another idea could be to process tiny little fast locally instead of in the cloud/gemini, but my software not advanced enough yet to detect that and do that... Bye for now, Skybuck. |
|
![]() |
|
Skybuck ![]() Senior Member ![]() Joined: 18 Apr 2023 Status: Offline Points: 1640 |
![]() ![]() ![]() ![]() ![]() |
^ This happened while trying out the "sensors" part of HWInfo64....
PCI Lock warning, abort, retry, cancel, etc... I tried retry a few times, it kept showing up. It did rapport which app was responsible for the like: iCue... So then eventually I shutdown iCue, exited it for system tray, restart HWInfo64 and then no more complaints. So at least this allows me to inspect temperatures, voltages, settings, etc. Afterwards I shut down HWInfo64, restart iCue and everything should be fine again. Though I do now wonder if maybe iCue could have an affect on system performance, not sure yet... |
|
![]() |
|
Skybuck ![]() Senior Member ![]() Joined: 18 Apr 2023 Status: Offline Points: 1640 |
![]() ![]() ![]() ![]() ![]() |
Hmmmmm I just tried HWInfo64, it complained iCue was holding the PCI bus or something too long, PCI Lock.
I exited iCue from system try and this resolved the issue for iCue, but now the memory leds fall back to their programmed pattern... so this a bit of a strange situation... |
|
![]() |
|
Skybuck ![]() Senior Member ![]() Joined: 18 Apr 2023 Status: Offline Points: 1640 |
![]() ![]() ![]() ![]() ![]() |
One more future note so I can let it flow out of my brain lol:
I let gemini flash 2.5 analyze the memory situation/settings of my system. The end conclusion of gemini flash 2.5 is more or less the following: The memory chips are capable of running at 6000 MT/sec. The system can do it as well. However getting it stable might be difficult. In case I or others with a similar system want to try it in the future: The AI recommends: 1. Install the latest BIOS to increase the chance of stability. 2. Ofcourse at least version 1.28 for this particular system should be installed to prevent any Voltage Soc going to the processor from going higher than 1.25 volts or 1.305 volts. (On youtube I saw somebody set it to 1.2 volts to begin with). 3. Enable AMD Expo. The expected performance gains are: 20% lower latency and 66% more theoretical bandwidth. In certain use cases this might be noticeable, like perhaps virtual machines, maybe loading AI models. However during benchmarking by others long ago, they only showed 1% or 2% performance improvements probably because of 3D V Cache or they didn't test it right ;) The risk of damage is low according to the AI because it was all designed for this. However system instability is still a risk. For me system stability is very important so I am perfectly fine with running memory at 3600 MT/sec which is still a very fast system compared to what I used to be running and also compared to real world usage that I am getting which is lightning fast for a human, so I have little desire to experiment with this at the time being. However it does make me a little bit curious. Another however, if this system were to be damaged a replacement would not be easy since newer models seem to be blowing up and it would be a lot of work to take the system apart or build a new one. So benefits do not outway the risks. But it is interesting to note this down for the future in case I do buy a better/new or backup computer and I want to experiment with this system to see what it could really do. Finally I will throw in some numbers computed by the AI: Revisiting the Timings (30-30-30-58): Knowing it's DDR5 running at 3600 MT/s (1796.8 MHz clock), let's calculate the real-world latency in nanoseconds (ns): With current timings at 3600 MT/sec: tCAS-tRCD-tRP-tRAS: 30-30-30-58 Clock Period: 1 / 1796.8 MHz = 0.5566 nanoseconds per clock cycle. tCAS Latency: 30 cycles * 0.5566 ns/cycle = 16.698 ns tRCD Latency: 30 cycles * 0.5566 ns/cycle = 16.698 ns tRP Latency: 30 cycles * 0.5566 ns/cycle = 16.698 ns tRAS Latency: 58 cycles * 0.5566 ns/cycle = 32.28 ns Supported Module Timing at 3000.0 MHz (EXPO Profile 0): 40-40-40-77 This is the key EXPO profile your RAM is designed for. At a true clock speed of 3000 MHz (which is 6000 MT/s), the timings are 40-40-40-77. CAS Latency in ns (EXPO): 40 cycles * (1 / 3000 MHz) = 13.33 ns Comparison of Real-World Latency: Current (DDR5-3600 CL30): ~16.7 ns EXPO (DDR5-6000 CL40): ~13.3 ns This confirms that by enabling EXPO, you'd be reducing your primary CAS latency by approximately 20% (16.7 ns to 13.3 ns), in addition to the massive increase in bandwidth (3600 MT/s to 6000 MT/s). In summary, by enabling EXPO, you are looking at: A substantial increase in memory bandwidth (66% theoretical). A noticeable reduction in absolute CAS latency (around 20%). Tangible performance gains in CPU-intensive games and applications, especially smoother gameplay due to better 1% lows. More: 1. Absolute Latency (in nanoseconds): Current (3600 MT/s @ 1796.8 MHz clock, 30-30-30): CAS Latency: 30 cycles * (1 / 1796.8 MHz) = 16.7 ns With EXPO (6000 MT/s @ 3000 MHz clock, likely CL40): CAS Latency: 40 cycles * (1 / 3000 MHz) = 13.3 ns You're looking at a ~20% reduction in CAS latency in absolute terms. This means the time it takes for the CPU to receive data from the RAM is significantly reduced. 2. Bandwidth (Theoretical): Current (DDR5-3600, Dual Channel): 3600 MT/s * 8 bytes/transfer (64-bit bus) = 28.8 GB/s With EXPO (DDR5-6000, Dual Channel): 6000 MT/s * 8 bytes/transfer = 48 GB/s This is a massive 66% increase in theoretical memory bandwidth! This means your system can move data to and from the RAM much faster. Concerning voltage settings: 1.35V for DRAM (VDD/VDDQ) is completely normal and safe for DDR5-6000. The 1.25V / 1.3V limit you're thinking of applies to the CPU SoC Voltage. During the initial Ryzen 7000 launch, some motherboard BIOS versions (before patches) were automatically pushing the CPU SoC voltage too high when EXPO was enabled, sometimes going over 1.4V. This excessive SoC voltage was indeed found to potentially cause damage to the CPU's memory controller (leading to "burn outs"). AMD quickly addressed this with AGESA updates, and motherboard manufacturers followed suit with new BIOS versions. These updates now cap the CPU SoC voltage at a maximum of 1.3V (with 1.25V often being the recommended "sweet spot" for stability and longevity). When you enable EXPO, the motherboard should now automatically apply a safe SoC voltage that adheres to these new limits, typically 1.25V or slightly higher, but staying below 1.3V. Key Confirmations and Insights: Corsair CMT64GX5M2B6000Z40: This confirms the exact part number for your RAM kit, which is indeed a 64GB (2x32GB) DDR5-6000 CL40 kit. The "Z40" at the end of the part number specifically indicates the CL40 latency. SK Hynix A-Die: This is excellent news! SK Hynix A-Die is currently one of the best DRAM ICs (chips) for DDR5 overclocking and stability, known for its ability to reach high frequencies and relatively tight timings. Module Size: 32 GBytes, Number of Ranks: 2 (Dual Rank): You have two 32GB dual-rank modules. This confirms your 128GB total memory is achieved with two modules, which is generally easier to stabilize at higher speeds than four modules, although 32GB dual-rank modules still put more strain on the memory controller than 16GB single-rank modules. Memory Speed: 3000.0 MHz (DDR5-6000 / PC5-48000): This is the rated effective data rate for the modules via their EXPO profile. For now I am going to take what the AI wrote with a grain of salt, I do believe it somewhat, but confirmations by others would be nice to make sure the AI is not hallucinating :) (There was more text, but I copy & paste the most essential parts of it ! ;)) Laterz ! Bye, Skybuck ! ;) =D |
|
![]() |
|
Skybuck ![]() Senior Member ![]() Joined: 18 Apr 2023 Status: Offline Points: 1640 |
![]() ![]() ![]() ![]() ![]() |
Two more things to take note off:
1. The build-in gpu in the ryzen processor was disabled some months ago, I found AMD Ryzen Graphics drivers buggy for this build-in processor, however ASRock support shows new vga drivers available for this build-in GPU but also other radeon graphics cards, I am not sure if AMD has done something to make their drivers work better for this build-in gpu. For now I won't test it again because I want butter smooth PC operation ! ;) but I might test it again in the future, I did download a driver in case I want to install/try it out. To do this I would have to re-enable the build-in gpu in the bios and install this new driver: AMD_VGA(v24.30.66.250610a).zip Stored under: G:\Downloads\Drivers\Motherboard\ASRock B650E Steel Legend WiFi\AMD graphics driver (Adrenalin) There is also another folder where I downloaded these adrenline drivers from AMD site instead of ASRock's site, they contain somewhat older/different drivers or same, not sure if ASRock changes anything to them, I think more or less the same: G:\Downloads\Drivers\Processor (Embedded Graphics) Last version I downloaded and tried some time ago, but it was buggy: whql-amd-software-adrenalin-edition-24.10.1-win10-win11-oct-rdna.exe ^ downloaded 4 december 2024 I just downloaded a new version, just in case AMD gets bombed in WW3 lol: whql-amd-software-adrenalin-edition-25.6.1-win10-win11-june5-rdna.exe ^ downloaded today 23 july 2025 ^ Have not tried it yet, but might do so in future. I see there is also a slightly newer version: Adrenalin 25.6.3 (Optional) https://drivers.amd.com/drivers/amd-software-adrenalin-edition-25.6.3-win10-win11-june-rdna.exe ^ But it's not WHQL certified, so not going to download that one, though sometimes those might work better/newer. ^ Will not download it for now, have not tried it, but skip if newer in future. 2. I also saved a HWInfo64 log and uploaded it to my webdrive, might come in handy in the future if my SuperPC2023 ever dies and I want to know what kind of characteristics/performance it had compared to any new system in the future ! ;): Here is the log file from HWInfo64 for Skybuck's SuperPC 2023: https://www.skybuck.org/Hardware/SuperPC2023/HWInfo64-SUPERPC2023-23-july-2025.LOG Bye for now, Skybuck. |
|
![]() |
|
Skybuck ![]() Senior Member ![]() Joined: 18 Apr 2023 Status: Offline Points: 1640 |
![]() ![]() ![]() ![]() ![]() |
HWInfo64 removed and new version v8.28-5770 installed and working.
(This tool seems to manipulate/interact with the SMBus / I2C, so care must be taken not to accidently click on some things or whatever). It can generate a full report of system... but I see no way of including such a long report in this forum... So everything is working again. To be more specific about lightning settings for iCue/DRAM: Lighting Link1/Quick Lighting Zone: Dimm #1 set to: AMD Ryzen 9 7950X3D Package Lighting Link2/Quick Lighting Zone: Dimm #2 set to: ASRock B650E Steel Legend WiFi Temp #1 Lighting Link3/Quick Lighting Zone: Dimm #3 set to: ASRock B650E Steel Legend WiFi Temp #4 Lighting Link3/Quick Lighting Zone: Dimm #3 set to: Dominator Platinum RGB DDR5 Temp #4. (At lighting link #3 at had Temp #3 from mobo but decided to switch to temp #4 because it's slightly hotter, not yet sure which parts of the mono are being heat measured here, so that's where I am a little bit unstatisfied, but perhaps in time with HWInfo64 I can figure it out further, but this is something where mobo makers/sensors/hwinfo64 could do some more work to improve upon this in the future). All temperature colors set to: Green 20 degree celcius Yellow 40 degree celcius Red 60 degree celcius (It shades them in between). Current ambient/room temperature is 24.0 degree celcius Dimm 1. yellow/orangish Dimm 2. light greenish Dimm 3. yellow/orangish Dimm 4. yellowish green. Power supply temperature 25 celcius. Power supply fan RPM set to approx 1000 RPM. Very silent system at the moment, I do have outside balkany door open though for fresh air and I kinda like it, might close it soon though. Inside asus armory crate: (Dominator set not present anymore, I guess I delinked it in iCue or something, which is good I don't want armory crate controlling it because it doesn't have access to all sensors...) Asus auro sync->Auro Effects->Smart<-Click it it shows: By GPU Temperature (select it ;)) Low Point: 60 degree celcius, High point 85 degree celcius. So far the GPU has been greenish the entire summer hehe. This is cool/nice setup, can quickly see temperatures in a glance, assuming it works and it does seem to work ;) It's an amazing system, it continues to amaze me every day ! ;) XD =D Later, bye for now, Skybuck |
|
![]() |
Post Reply ![]() |
Page 12> |
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |