"Your PC encountered a problem and needs to restart." - Round 2.

hbenthow · 10 Mar 2017

BSODHunter said:

What temps do u get when running Prime95?

If I recall right, it varied between 74-77 Celcius.

hbenthow · 10 Mar 2017

I've recorded a video of Speccy and Windows Task Manager side by side. Do these temperatures seem excessive for the amount of CPU being used?

hbenthow · 20 Mar 2017

I just had another system crash. This time, I was using the computer, and the screen went completely black for a few seconds (without the power light on the case going completely out), then the computer booted back up.

This time, I had the maximum processor usage set to 99% in Windows Power Options, causing my CPU to only use up to 3.0 Ghz at maximum, which was preventing an overheating problem I was having before (when my processor goes into AMD's Turbo mode, my CPU can get over 70 degrees Celcius). By setting the Windows Power Options setting to 99%, I preventing my CPU from going into Turbo mode, and it stayed in the 40s and lower 50s Celcius thereafter, including the time of this latest crash. For this reason, I don't think that CPU overheating is a possible cause this time around (although it may have been in the first two crashes).

Here's the BlueScreenView info:

==================================================
Dump File : 032017-29921-01.dmp
Crash Time : 3/20/2017 12:22:49 AM
Bug Check String :
Bug Check Code : 0x00000124
Parameter 1 : 00000000`00000000
Parameter 2 : ffffe784`f93968f8
Parameter 3 : 00000000`00000000
Parameter 4 : 00000000`00000000
Caused By Driver : ntoskrnl.exe
Caused By Address : ntoskrnl.exe+6b3827
File Description : NT Kernel & System
Product Name : Microsoft® Windows® Operating System
Company : Microsoft Corporation
File Version : 10.0.14393.953 (rs1_release_inmarket.170303-1614)
Processor : x64
Crash Address : ntoskrnl.exe+6b3827
Stack Address 1 :
Stack Address 2 :
Stack Address 3 :
Computer Name :
Full Path : C:\WINDOWS\Minidump\032017-29921-01.dmp
Processors Count : 2
Major Version : 15
Minor Version : 14393
Dump File Size : 262,144
Dump File Time : 3/20/2017 12:23:06 AM==================================================

Also, I've attached the ZIP file containing the information gleaned by the DM Log Collector Tool to this post.

BSODHunter · 03 Apr 2017

WHEA Hardware Error again, but this time with an "Bus error":

Code:

0x00000124:    WHEA_UNCORRECTABLE_ERROR (20.03.2017 07:22:49) [Windows 10] 
 
CAUSED BY:     AuthenticAMD    
 
WHEA Notify:   Machine Check Exception 
     Type:     BUS error 
     Error:    BUSLG_OBS_ERR_*_NOTIMEOUT_ERR (Proc 0 Bank 4) 
 
PROCESS:       System 
 
Usual causes:  Hardware, Incompatibility, May be driver corruption

0x124

WHEA_UNCORRECTABLE_ERROR

A 0x124 is one of the worst STOP codes to encounter for the sole reason that the dump files usually give nothing away as to the cause of the problem. WHEA (Windows Hardware Error Architecture) errors signify a problem with hardware but very rarely pinpoint the culprit. In these scenarios it is advised to run a series of hardware stress and diagnostic tests to try and pinpoint the problem. A template is offered below which covers the four main components (GPU, CPU, RAM, HDD) and gives you a fighting chance of narrowing down the problematic device.

Generic "Stop 0x124" Troubleshooting Steps:

1) Ensure that none of the hardware components are overclocked. Overclocking means pushing the components beyond what they were designed for. If you do not know what that is you probably are not overclocking so go to the next step
2) Ensure that the machine is adequately cooled. If this is a laptop use compressed air to carefully blow out the heat pipe and fan when the computer is turned off. If a desktop take the side cover off and blow a fan on the components
3) Update all hardware-related drivers: video, sound, RAID (if any), etc. Do not rely on windows when it says the most recent driver is installed. It may be the most recent but it may also be corrupt.
4) Update the motherboard BIOS according to the manufacturer's instructions and clear the CMOS. Check with the computer maker for directions on this procedure and as usual it is always a good idea to back up your data.
5) Update ALL OS Windows updates
6) Stress test the major components. Start with RAM, CPU, HD, etc.
For RAM, use Memtest. Instructions for which can be found here.
Computers are extremely sensitive to problematic ram so any errors on memtest should be considered an issue and even a clear report for less than 8 passes can give a false negative.
For CPU use Prime95
For hard drives, use CHKDSK /F finds any problems on the drive(s), notably "bad sectors". You should also go to the HD maker and download and run their HD checking utility
7) Perform a "vanilla" (clean) re-installation of Windows: Install nothing that is not from the OS (not even Malware) until you have seen that the computer is not crashing in this state.
When the vanilla installation has run long enough that you are convinced it is OK, start installing updates and applications a few at a time, and wait until you are again convinced it is OK. If the crashes resume it is obviously the last group of installations that is the issue and remove them.
8) Re-seat all connectors, ram modules, etc. You can use the same can of compressed air to clean out the RAM DIMM sockets as much as possible.
Only attempt this if you are FULLY knowledgeable about the procedures.
9) If all else fails, start removing items of hardware one-by-one in the hope that the culprit is something non-essential which can be removed.

Diagnostic Test

RAM TEST

Run MemTest86+ to analyse your RAM. MemTest86+ - Test RAM - Windows 10 Forums

   Note

MemTest86+ needs to be run for at least 8 complete passes for conclusive results. Set it running before you go to bed and leave it overnight. We're looking for zero errors here. Even a single error will indicate RAM failure.

Make a photo of the result and post it.

Diagnostic Test

GPU TEST

Run Furmark to stress test your GPU. FurMark - GPU Stress Test - Windows 10 Forums

   Warning

Your GPU temperatures will rise quickly while Furmark is running. Keep a keen eye on them and abort the test if overheating occurs.

Diagnostic Test

CPU TEST

Run Prime95 to stress test your CPU. Prime95 - Stress Test Your CPU - Windows 10 Forums

   Warning

Your CPU temperatures will rise quickly while under this stress test. Keep a keen eye on them and abort the test if overheating occurs.

Diagnostic Test

HDD TEST

Run HDTune to check health and scan for errors.

   Note

It may take some time, but look for signs of any errors or failure.

You can also run a disk check using chkdsk. Refer to the tutorial for details of how to do this.

hbenthow · 03 Apr 2017

BSODHunter said:

WHEA Hardware Error again, but this time with an "Bus error":

Code:

0x00000124:    WHEA_UNCORRECTABLE_ERROR (20.03.2017 07:22:49) [Windows 10] 
 
CAUSED BY:     AuthenticAMD    
 
WHEA Notify:   Machine Check Exception 
     Type:     BUS error 
     Error:    BUSLG_OBS_ERR_*_NOTIMEOUT_ERR (Proc 0 Bank 4) 
 
PROCESS:       System 
 
Usual causes:  Hardware, Incompatibility, May be driver corruption

As I mentioned earlier in this thread, I have already run Furmark, CHKDSK, Prime95, etc, all showing no signs of a hardware problem. Since I posted my previous post here (but before you posted your most recent reply), I decided to uninstall Bitdefender, as I have heard claims that it can occasionally cause errors usually associated with hardware problems. I had originally wanted to put off uninstalling it, but have now decided that it's better to try to find out if the errors go away without Bitdefender before trying anything more drastic.

As usual, it's a waiting game, as there's no way to tell if a problem still exists unless it either reappears or ceases to cause trouble for so long that it's unlikely to still be around.

what is the particular significance of the error being labeled as a BUS error this time? Is there anything about it being a BUS error that could, in and of itself, lead to a clue as to the origin of the problem?

axe0 · 05 Apr 2017

To understand what the error means, we need to decipher it a little.
BUSLG_OBS_ERR_*_NOTIMEOUT_ERR
LG stands for level generic, this is the cache position of the CPU where the error was noticed, as it is LG it means that the level of the cache couldn't be determined at the time of the crash.
OBS means that the error was observed as a 3rd party by the CPU core that noticed the error

What this means, the CPU picked up the error but wasn't involved in sending nor receiving the instructions.

hbenthow · 05 Apr 2017

axe0 said:

To understand what the error means, we need to decipher it a little.
BUSLG_OBS_ERR_*_NOTIMEOUT_ERR
LG stands for level generic, this is the cache position of the CPU where the error was noticed, as it is LG it means that the level of the cache couldn't be determined at the time of the crash.
OBS means that the error was observed as a 3rd party by the CPU core that noticed the error

What this means, the CPU picked up the error but wasn't involved in sending nor receiving the instructions.

I'm not very well-versed in this sort of highly technical information. Could you please explain the significance of that? Does it mean that the CPU itself is not to blame?

axe0 · 05 Apr 2017

It basically means that the CPU isn't to be blamed, the CPU caught the error but wasn't doing much more than watching.

hbenthow · 05 Apr 2017

axe0 said:

It basically means that the CPU isn't to be blamed, the CPU caught the error but wasn't doing much more than watching.

In such cases, what is the most likely culprit?

Could Bitdefender have possibly caused the crashes?

Could the fact that my RAM sticks are two different speeds (one 667 Mhz and one 800 Mhz), which causes my system to underclock the faster one to the speed of the lower one, have anything to do with it?

axe0 · 05 Apr 2017

I can't say, I've only seen a single dump in the latest zip. With 0x124 there are multiple dumps required for patterns to be identified to possibly identify the culprit.