Recurring BSOD WHEA_UNCORRECTABLE_ERROR frequently but inconsistent


  1. Posts : 16
    Windows 10
       #1

    Recurring BSOD WHEA_UNCORRECTABLE_ERROR frequently but inconsistent


    I have been getting the WHEA_UNCORRECTABLE_ERROR BSOD for years with my PC. Over the years I have swapped out every part except for the CPU and PSU. The same BSOD keeps occuring but it seems inconsistent as to when it will happen. It happens a couple times per week, even under low load like browsing the internet. With some games (BF3, GTA IV) it happens very frequently (every 20 minutes), while I have been able to stream with OBS for 12 hours without a BSOD or play other games for a long time without issues. About 6 months ago I did a clean install of Windows 10 on a new SSD and the issue remained as well

    I would like to know if the CPU is the issue because all other parts have been replaced over time.



    Attachment 245015
      My Computer


  2. Posts : 5,330
    Windows 11 Pro 64-bit
       #2

    A machine-check exception (MCE) is a type of computer hardware error that occurs when a computer's central processing unit detects an unrecoverable hardware error in the processor itself, the memory, the i/o devices or on the system bus. It is not caused by software. It looks like graphic card causing the BSOD error.

    If you haven't install latest version of graphic card driver.

    Check C: partition for file system and bad sector errors.

    1. Start the Command Prompt as a administrator.

    2. Within Command Prompt type Chkdsk C: /r command, press Enter key to schedule error checking C: partition at next time the system reboots.



    Repair any Corrupted Windows System Files

    1. Start the Command Prompt as a administrator.

    2. Within Command Prompt type Sfc /Scannow command, press Enter key.




    Important Recommended Steps

    • Update all installed applications
    • Uninstall currently installed device drivers and then install latest version of the devices drivers.
    • Install all of the Windows important updates.
    • If you are overclocking (pushing the components beyond their design) you should revert to default at least until the crashing is solved. If you don't know what it is you probably are not overclocking.
    • Use SpeedFan to monitor temperatures in computers, overheating can cause BSOD.
    • Use Memtest86+ disc to check system memory's (RAM's) for errors.
    • Use CrystalDiskInfo to check SMART health report of the hard disk drive (HDD).
    • Use Prime95 to stress test your CPU
    • Use 3DMark to stress test your GPU


    Click the url links for the website to follow instructions on how to do hardware diagnostics:
    - Diagnostics- Hardware Diagnostics
    - Hardware Stripdown Troubleshooting
      My Computer


  3. Posts : 14,046
    Windows 11 Pro X64 22H2 22621.1848
       #3

    I think your CPU is defective. All 5 dumps are identical:
    Code:
    WHEA_UNCORRECTABLE_ERROR (124)
    A fatal hardware error has occurred. Parameter 1 identifies the type of error
    source that reported the error. Parameter 2 holds the address of the
    WHEA_ERROR_RECORD structure that describes the error conditon.
    Arguments:
    Arg1: 0000000000000000, Machine Check Exception
    Arg2: ffffd80fedd65028, Address of the WHEA_ERROR_RECORD structure.
    Arg3: 00000000fe800030, High order 32-bits of the MCi_STATUS value.
    Arg4: 0000000000000135, Low order 32-bits of the MCi_STATUS value.
    Code:
    6: kd> !errrec ffffd80fedd65028
    ===============================================================================
    Common Platform Error Record @ ffffd80fedd65028
    -------------------------------------------------------------------------------
    Record Id     : 01d55ea68a85fe70
    Severity      : Fatal (1)
    Length        : 936
    Creator       : Microsoft
    Notify Type   : Machine Check Exception
    Timestamp     : 8/29/2019 20:40:58 (UTC)
    Flags         : 0x00000000
    
    ===============================================================================
    Section 0     : Processor Generic
    -------------------------------------------------------------------------------
    Descriptor    @ ffffd80fedd650a8
    Section       @ ffffd80fedd65180
    Offset        : 344
    Length        : 192
    Flags         : 0x00000001 Primary
    Severity      : Fatal
    
    Proc. Type    : x86/x64
    Instr. Set    : x64
    Error Type    : Cache error
    Operation     : Data Read
    Flags         : 0x00
    Level         : 1
    CPU Version   : 0x0000000000600f12
    Processor ID  : 0x0000000000000006
    
    ===============================================================================
    Section 1     : x86/x64 Processor Specific
    -------------------------------------------------------------------------------
    Descriptor    @ ffffd80fedd650f0
    Section       @ ffffd80fedd65240
    Offset        : 536
    Length        : 128
    Flags         : 0x00000000
    Severity      : Fatal
    
    Local APIC Id : 0x0000000000000006
    CPU Id        : 12 0f 60 00 00 08 08 06 - 0b 22 98 1e ff fb 8b 17
                    00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
                    00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
    
    Proc. Info 0  @ ffffd80fedd65240
    
    ===============================================================================
    Section 2     : x86/x64 MCA
    -------------------------------------------------------------------------------
    Descriptor    @ ffffd80fedd65138
    Section       @ ffffd80fedd652c0
    Offset        : 664
    Length        : 272
    Flags         : 0x00000000
    Severity      : Fatal
    
    Error         : DCACHEL1_DRD_ERR (Proc 6 Bank 0)
      Status      : 0xfe80003000000135
      Address     : 0x0000000013196930
      Misc.       : 0x0000000000000000
    Definitely run all the tests that FreeBooter recommended but I think a replacement CPU will resolve the problem. Can't guarantee it but the odds are pretty good. In every dump, GTAIV.exe was running when the failure occurred so there is a possibility of a problem there. Maybe check the GTA forums, see if there any reports of WHEA 124 problems????

    Add Driver Verifier to the list of tests to run:
    Run Driver Verifier

    Enable and Disable Driver Verifier in Windows 10

    What we're looking for is a verifier generated BSOD with a mini dump that will tell us what driver caused it. If you get a BSOD, rerun the V2 log collector as soon as possible and upload the resulting zip file. Also see if there is a new C:\Windows \MEMORY.DMP file. If there is, copy it to another location then zip it an upload to a file sharing site like OneDrive and post a link to it here.
      My Computers


  4. Posts : 392
    W10
       #4

    As Windows has become more complicated, so have the BSOD's.
    In the old days (before Vista), the equivalent error was a STOP 0x9C ( MACHINE_CHECK_EXCEPTION)
    Vista brought in the STOP 0x124 error (WHEA_UNCORRECTABLE_ERROR) - which is slightly different from the STOP 0x9C (due to the advances in technology - both with hardware and with Windows)

    Now Windows has WHEA (Windows Hardware Error Architecture). In the early days of this type of error, it was assumed that the error was always due to a hardware problem. But shortly after the errors started being seen, it was determined that this could also be caused by lower level drivers/compatibility issues (see this topic for some links that show this: BSOD Often & freezes )

    In short, WHEA allows for attached hardware to to send error reports to Windows (if it's compatible and enabled).
    It's most common that you'll see the error coming from the CPU - but it can come from any device that reports to Windows (including the video card and the buses on the motherboard).
    I've worked with these errors for years, and find that even debugging isn't very helpful.
    That's all it is - then it's the user's problem to figure out what the problem is and how to fix it.

    In this case, there's some evidence of an error in the CPU cache. While this does not necessarily mean that the CPU cache is at fault, it's wise to check the cache to see if it does have a problem. I find the best way to check the cache is by using Prime95 to test: Prime95 - Stress Test Your CPU

    See this link (towards the bottom of the page) for a discussion of the different tests and what to use them for: Prime95
    In this case, I suggest running the Small FFT's test first - then the Blend and then the Large FFT's (just to be sure).

    At work we figure it out this way:
    1 - backup the user data
    2 - update the BIOS/UEFI to the latest, W10 compatible version
    3 - run hardware diagnostics (to quickly rule out some potential hardware problems)
    We do anti-malware scans; MemTest86+; Seagate Seatools; and a proprietary system scan (similar to the Dell and HP diagnostics)
    If all tests pass, then we move on to the next step.....
    4 - clean install Windows and see if the BSOD's recur
    If they do, then it's a hardware problem
    If they don't, then it was a software problem (and it was fixed with the clean install)
    5 - then update everything from the OEM manufacturer's website and the support site for all attached devices.
    6 - install 3rd party software as desired.

    Good luck!
      My Computer


  5. Posts : 16
    Windows 10
    Thread Starter
       #5

    Firstly, thank you all for the suggestions. I will comment on your post in short.

    Freebooter:
    The latest component change in the system has been the GPU. I switched from an HD7870 to a GTX970. This is my third graphics card in all the years I've had this system. I have also been getting this same BSOD error over all these years. The only original components in my system are my CPU and PSU and HDD (though I have my OS installed on an SSD). My GPU driver is up to date. Last week I've applied some fresh thermal paste to my CPU and I've cleaned out excessive dust out of the system. Temps are looking fine in Speccy while playing different games.

    I will however try all the tests you've suggested just to rule anything out.


    Ztruker:
    Thank you for your input as well. The last couple BSOD's where wile playing GTA IV. So far not many games cause the same BSOD that frequently, but GTA IV and BF3 are one of them. With games like DayZ, Assetto Corsa and World of Tanks the frequenty of BSOD's doesn't increase, but they can still happen, as it does with any game. The same BSOD can also happen when doing photo editing or playing Spotify or watching a video on YouTube. Or even when just browsing through some folders. It is very inconsistent in that regard.

    One question, what dump file can I best select. It was on Small memory dump, but I've set it to Complete memory dump for now. I'll try out your test as well!

    jdc1:
    Thank you for your comprehensive explanation. I will try the different stress tests with Prime95. I've had this exact BSOD issues for years now and a couple months ago I bought a new SSD, copied the partition of the old SSD over to the new one. Still got the same BSOD's, so I decided to do a clean install on the new SSD. Hasn't made any difference because the same BSOD returns.

    But basically what you're saying is: with all stress tests and diagnostics and dump files that can be analysed, it's still not certain to point out what piece of hardware causes the issue, right? As I said the only original components in my system are the PSU and CPU, even though the BSOD has stayed the same over the years so I wanna make sure it's the CPU which I can replace in that case.


    I will report back to the thread after I've done a load of tests! Thanks once again for your time in helping me!
      My Computer


  6. Posts : 392
    W10
       #6

    All hardware "diagnostics" are merely software tests of a hardware component.
    They are not 100% reliable (see note at the bottom for more details of my experiences) - but they are useful for helping to rule out/in problems and to cut down on the time you spend troubleshooting/repairing.

    The only 100% reliable test (IMO) is to either remove the component from the system (to test), or to try another component in it's place (to test). But that gets expensive if you're just randomly replacing parts! And that's why we ask you to run all these tests.

    Even when we narrow down the problems to the specific piece of hardware that we believe is to blame - we still suggest either borrowing another piece from a friend, or purchasing one from a shop that will let you return it for your money back (if it's not needed).

    NOTES:
    1) Several things about software testing.
    I've been working on PC's since the days of DOS, and have been working as a PC tech for 14 years (in a shop that has, on average, 20 - 30 computers being worked on every day). :
    - I have seen many, many drives pass S.M.A.R.T. tests - but the drive is still bad.
    - I have seen hard drives pass other diagnostics and still be bad. And have seen them fail diagnostics and still be good.
    - I have seen bad hard drives cause software and hardware problems with other devices (to include diagnostics).
    - I have seen video cards fail diagnostics and still be good - and pass diagnostics and still be bad (the worst caused me to get 2nd degree burns when the GPU temp sensor had failed!)
    - I have seen (only once) an instance where MemTest86+ passed RAM that had faults in it (found with another tester).
    - I have even seen a clean install of Windows pass Windows (made a good clean install) - even though there was a hardware problem (we suppose it's because the Microsoft drivers weren't as complex as the 3rd party drivers for that same device - but that's just a guess on our part)

    2) Troubleshooting usually starts with software troubleshooting - as it's (in most cases) free, it takes less time than hardware troubleshooting, and it's fairly straightforward. The easiest software troubleshooting is a clean install of Windows without any unneeded 3rd party drivers. At work we like to use the recovery media supplied by the OEM manufacturer or the default Windows installer for the latest version of Windows that the system supports.

    A clean install does a few things:
    - rules out corruption of Windows files (by replacing them)
    - rules out 3rd party software problems (by removing them)
    - rules out Windows problems (by reinstalling Windows the way that the OEM/Microsoft intended)
    - and it helps to rule in/out hardware problems (if the clean install fixes things, then it's a software problem. If it doesn't, then it's a hardware problem.
    BUT, remember, software tests aren't 100% reliable - so further hardware testing is needed at that point.
      My Computer


  7. Posts : 2,585
    Win 11
       #7

    I had that error on an i7 3770 CPU. It passed all diagnostics including Intel's CPU diagnostic. As I do some PC repair I had spare parts and changed everything (motherboard, power supply, memory, video) and still had the problem. I didn't have another i7 3770 but had an i5 3550 and I replaced the i7 3770 and the problem went away. I finally convinced Intel it had a problem and they replaced the i7 3770 under warranty.
      My Computers


  8. Posts : 16
    Windows 10
    Thread Starter
       #8

    So I checked my SSD's (one has the OS) and my HDD for errors. Tests said no errors. Same with MemTest. It didn't find anything wrong. I also cleaned out the dust of my case and I applied a new layer of thermal paste onto the CPU because it was a while ago since I last did that. Temps stay cool (47 degrees C) even under full load

    I also did the Small FFT test with Prime95 and there I ran into issues. After 10 to 15 minutes, my PC would freeze and whatever I tried, nothing could undo that, except for resetting/power off.

    I set the dump file settings to full memory dump (it was on small memory dump) but I noticed there is only one file and that it's 9GB. What dump file setting do you guys recommend? I included a new log collector file.

    No website sells AM3+ CPU's under warranty so I don't want to take the risk yet of buying another AMD FX processor if I'm not fully certain it's the CPU. Can you guys give more certainty as to the CPU being at fault of the BSOD's with the new info?

    Attachment 247158
      My Computer


 

  Related Discussions
Our Sites
Site Links
About Us
Windows 10 Forums is an independent web site and has not been authorized, sponsored, or otherwise approved by Microsoft Corporation. "Windows 10" and related materials are trademarks of Microsoft Corp.

© Designer Media Ltd
All times are GMT -5. The time now is 21:15.
Find Us




Windows 10 Forums