WHEA Uncorrectable Error among others

Page 1 of 2 12 LastLast

  1. Posts : 6
    Windows 10
       #1

    WHEA Uncorrectable Error among others


    Hello.

    My pc has been ramping up the BSODs lately, it started from maybe once a month to several times a day, especially when actively gaming, although they do happen even when I'm just on Facebook. The most frequent ones are WHEA Uncorrectable Error and DPC Watchdog Violation. I've run most of the common tests that could point to the problem I could find, including 4 passes of memtest86 and everything comes up clean. At this point I'm convinced I'm gonna have to replace some part of the pc but I still don't know exactly what the problem is.

    Microsoft OneDrive - Access files anywhere. Create docs with free Office Online.

    Thank you for your time.
      My Computer


  2. Posts : 392
    W10
       #2

    6 drives installed - more than 3 causes me to wonder about what else is installed and if the PSU is strong enough to handle all this. I suggest disconnecting several of the drives (both data and power cables) along with any unneeded other components. Testing this way will help to rule out the PSU as a problem.

    Beyond that, this seems to be a hardware problem.
    I suggest starting with these free diagnostics:

    - MemTest86 (4 passes): MemTest86 - Official Site of the x86 Memory Testing Tool

    - Seagate Seatools for Windows (long/extended test on ALL drives); SeaTools for Windows |
    Seagate


    - OCCT (all tests): OCBASE / OCCT

    - Furmark (run until temp's level out or there are problems - whichever comes first): FurMark > Home

    - Prime95 Stress Testing (run for at least an hour - or until the temp's level out. If you get errors, STOP IMMEDIATELY): GIMPS - Free Prime95 software downloads - PrimeNet

    Finally, troubleshoot by hardware stripdown (this is an older post and some of the links are broken - but it's still relevant to your situation): Hardware Stripdown Troubleshooting

    - - - Updated - - -

    BIOS/UEFI version F3 dates from March of 2018.
    Please check the Gigabyte support website for your motherboard and install any stable BIOS update that's available.
    Last edited by jdc1; 15 May 2020 at 07:11.
      My Computer


  3. Posts : 6
    Windows 10
    Thread Starter
       #3

    Thank you for your response.

    I updated the BIOS and I'm going to run the rest of the tests now. I did the memtest86 4 passes 8 days ago, and there were 0 problems. Should I run it again in case the new BIOS changed something?

    My PSU is a Corsair TX 650M. My reasoning would be that if there wasn't enough power, it would just shut down instead of getting a BSOD. There have been cases where the pc would just restart, but they are rare among the general chaos of the constant crashing.
      My Computer


  4. Posts : 392
    W10
       #4

    Please run MemTest again. It probably won't have changed - but we're checking things just in case.

    The PSU can do strange things to your system to include BSOD's. I've seen a lot of them over the years - that's why I check the number of drives on each BSOD topic that I reply to.
    It won't hurt to disconnect some of the non-essential stuff - and you'll then know that it's not the PSU
    There's also a PSU test in the OCCT test suite. While software tests aren't 100% accurate, they give us something to start with.
      My Computer


  5. Posts : 6
    Windows 10
    Thread Starter
       #5

    Alright so, Memtest86, 4 passes cleared with zero errors.

    Seagate Seatools, all my drives cleared the long test, though the SSDs took around 10 mins, the others took literally hours.

    OCCT found no errors after ~15 mins of doing each test.

    Furmark found no problems either, temp went up to ~81C after 2.5 mins then stayed there for the half hour i kept it running.

    Prime95 got an error after ~5 mins of starting. Copying from the txt file:

    [Sat May 16 22:45:36 2020]
    Self-test 192K passed!
    Self-test 192K passed!
    Self-test 192K passed!
    Self-test 192K passed!
    Self-test 192K passed!
    Self-test 192K passed!
    Self-test 192K passed!
    Self-test 192K passed!
    Self-test 192K passed!
    Self-test 192K passed!
    Self-test 192K passed!
    FATAL ERROR: Rounding was 0.5, expected less than 0.4
    Hardware failure detected, consult stress.txt file.
    Self-test 192K passed!

    At which point, I stopped it as instructed.
    I may have not given the other tests enough time, if that's the case let me know and I'll start them all again.

    Edit: Just to preempt a possible question, I've not messed with overclocking at all, the whole machine should be at default settings.
      My Computer


  6. Posts : 392
    W10
       #6

    An error is an error.
    I presume you were doing the Blend test in Prime95 - is that correct?
    If so, then it's likely a RAM issue - despite having passed MemTest

    The point of this is to try and figure out which test was right - MemTest86 or Prime95
    If Prime95 was right, then you'll have to figure out which sticks are bad (also, do they have a warranty - so you can get them replaced without paying for them) Corsair RAM usually has a lifetime warranty - you'll have to check with the reseller or Corsair to determine how you want to exchange it (if necessary).
    If MemTest was right, then the next possible problem is the CPU
    You can try this free test to further test the CPU:
    Intel Processor Diagnostic Tool: The Intel(R) Processor Diagnostic Tool

    There's a couple of options here for further testing of the memory:
    - testing sticks with Prime95 one by one in the first slot (you choose the slot)
    - running MemTest86+ for many passes on all sticks at once (will requre disabling Secure Boot and enabling Legacy booting)
    - running MemTest86+ stick by stick for many passes (will requre disabling Secure Boot and enabling Legacy booting)

    FYI - there are no real "diagnostic" tests for SSD's. Their tests are primarily reading the software points that are built into the drive (such as S.M.A.R.T. readings, etc). While these aren't real accurate, they're all we have - so that's why we test for them. So, Seatools checks the entire drive for HDD's and only checks the software points for SSD's - that's why the HDD's take longer.
      My Computer


  7. Posts : 6
    Windows 10
    Thread Starter
       #7

    The Intel Diagnostic Tool found no problems.

    About Prime95. Yes, it was a blend test. I ran it again today and got a different error after ~20 mins.

    FATAL ERROR: Resulting sum was 574058671437330.6, expected: 573179062135108.9
    Hardware failure detected, consult stress.txt file.

    Copying from the text box of the test itself in case this helps (Worker #7):

    Test 1, 6000000 Lucas-Lehmer in-place iterations of M104799 using FMA3 FFT length 5K.
    FATAL ERROR: Resulting sum was 574058671437330.6, expected: 573179062135108.9

    Another thing that I noticed is, using CPUID for keeping an eye on the temps, the TMPIN4 and TMPIN5 values of the motherboard are consistently much higher than the rest. Could this indicate an overheating problem? The rest seem to be about where they are supposed to be.
      My Computer


  8. Posts : 392
    W10
       #8

    What temps are you seeing and for which devices?
    In general, temps should be low, but in normal use the CPU should be under 60 degrees C
    With Prime95 running it can go a lot higher. Should it reach the high 90's, it's time to shut down and evaluate if there's a cooling problem. Most CPU's will automatically shut off if the temp gets much higher than that.

    GPU (video card) can go a bit higher - but still temps in the high 90's are excessive and cooling problems need to be looked at.

    Other components normally will stay lower than this.

    Again - an error is an error.
    There's a problem with the system.
    When running the Blend test, the most stressed component is the RAM (memory).
    Beyond that, Prime95 also stresses the memory controller (on the CPU in newer chips) and the CPU cache
    So, your results point to a hardware error. Most likely it's RAM (memory), but it's possible there's a problem with the CPU.

    As you can see, there's really not any software test that's 100% accurate. We use these tests to help us figure out what's wrong - but in the end, you'll likely end up having to replace hardware (which is the final and definitive test). Do you have any other, known good, memory that you can test with?
      My Computer


  9. Posts : 624
    Windows 10 Pro 21H2 x64
       #9

    Looks like unstable CPU core(s). Getting errors like that, isn't normal for a stock CPU!
      My Computers


  10. Posts : 392
    W10
       #10

    Prime95's Blend test primarily stresses RAM (memory)
    Prime95 also stresses the memory controller and CPU cache
    Both are built into the CPU in newer CPU's. Older systems had the memory controller on the motherboard.

    You may also want to run the Large FFT's test and the Small FFT's test - and compare the time it takes to error.
    If you go to this link ( Prime95 ) and scroll down a bit - you'll see a table that attempts to address which test is best at showing which problem

    Blend test - memory (RAM)
    Small FFT's - CPU cache
    Large FFT's - memory controller

    Remember that this chart is a patchwork of guesses.
    Only use it to help diagnosing the problem - NOT as a basis for replacing parts!!!
      My Computer


 

  Related Discussions
Our Sites
Site Links
About Us
Windows 10 Forums is an independent web site and has not been authorized, sponsored, or otherwise approved by Microsoft Corporation. "Windows 10" and related materials are trademarks of Microsoft Corp.

© Designer Media Ltd
All times are GMT -5. The time now is 08:27.
Find Us




Windows 10 Forums