Volmgr 161 + WHEA

Page 15 of 19 FirstFirst ... 51314151617 ... LastLast

  1. Posts : 95
    Windows 10
    Thread Starter
       #141

    Well I think we went just about a week with no BSOD... and then it happened. Logs and dump file attached below.

    PSU is supposed to arrive later today, just want to make sure these are the same crashes/logs we've been seeing.

    apps+sys.zip - Google Drive Apps+sys

    MEMORY.DMP - Google Drive dump file
      My Computer


  2. Posts : 402
    Windows 10 and Windows 11
       #142

    What was different about what you were doing when the recent BSOD happened? Anything?

    In the System log I can see the bugcheck (0x124 again) and two other unexpected power off events (EventID 41) on 24th, but there is nothing else in there to indicate why it's failed - so a hardware event is most likely. There is nothing of interest in the Application log.

    I'm downloading the kernel dump but since I already know it's a 0x124 (WHEA_UNCORRECTABLE_ERROR) I don't expect it to reveal anything new. I'll update this post if there is.

    More of the same.

    Later Edit: After looking at the kernel dump there is a small (and probably insignificant) difference in this dump. The other 0x124 dumps had an argument 1 value of 0x0, which is a machine check exception - a hardware failure. This dump has an argument 1 value of 0x10, which is a device failure.

    The call stack again shows that the problem happened during an NVMe SSD device access, the same as we've already seen...
    Code:
    9: kd> !dpx
    Start memory scan  : 0xffff8006e78c2388 ($csp)
    End memory scan    : 0xffff8006e78c3000 (Kernel Stack Base)
    
                   rsp : 0xffff8006e78c2388 : 0xfffff8013a7b9abc : nt!WheaReportHwError+0x3ec
    0xffff8006e78c2388 : 0xfffff8013a7b9abc : nt!WheaReportHwError+0x3ec
    0xffff8006e78c2418 : 0xfffff801372910a4 : PSHED!PshedRetrieveErrorInfo+0x94
    0xffff8006e78c2428 : 0xfffff8013a40e1c9 : nt!EtwWriteEx+0x119
    0xffff8006e78c2438 : 0xfffff8013a7ba4f9 : nt!WheaHwErrorReportSetSeverityDeviceDriver+0x9
    0xffff8006e78c2460 : 0xfffff8013d299940 : storport!WheaErrorData
    0xffff8006e78c2468 : 0xfffff8013a7ba619 : nt!WheaHwErrorReportSubmitDeviceDriver+0xe9
    0xffff8006e78c2498 : 0xfffff8013a7ba735 : nt!WheaReportFatalHwErrorDeviceDriverEx+0xf5
    0xffff8006e78c24f8 : 0xfffff8013d273721 : storport!StorpWheaReportError+0x9d
    0xffff8006e78c2548 : 0xfffff8013d292350 : storport!g_StorpSourceGuid
    0xffff8006e78c2558 : 0xfffff8013d2735d0 : storport!StorpPopulateErrorData+0x120
    0xffff8006e78c2588 : 0xfffff8013d26cb00 : storport!StorpMarkDeviceFailed+0x358
    0xffff8006e78c2608 : 0xfffff8013a47f9c2 : nt!KiInsertQueueDpc+0x332
    0xffff8006e78c2658 : 0xfffff8013d312760 :  !du ""Controller Reset failed due to surprise remove""
    0xffff8006e78c2690 : 0xffff898c173f51a8 :  !du "stornvme"
    0xffff8006e78c2710 : 0xfffff8013d2990ff : storport!SrbShimHooks+0xf
    0xffff8006e78c2728 : 0xfffff8013d2688ed : storport!StorEtwMiniportLogError+0x255
    0xffff8006e78c2760 : 0xffff898c173f51a8 :  !du "stornvme"
    0xffff8006e78c2798 : 0xfffff8013d312010 :  !du ""Controller Fatal Status is set""
    0xffff8006e78c27b8 : 0xfffff8013d312010 :  !du ""Controller Fatal Status is set""
    0xffff8006e78c27c8 : 0xfffff8013d312010 :  !du ""Controller Fatal Status is set""
    0xffff8006e78c27e0 : 0xfffff8013af25440 : nt!ExNode0
    0xffff8006e78c2818 : 0xfffff8013d23f07c : storport!StorPortNotification+0x91c
    0xffff8006e78c2820 : 0xfffff8013d299000 : storport!WPP_GLOBAL_Control
    0xffff8006e78c2840 : 0xfffff8013d312760 :  !du ""Controller Reset failed due to surprise remove""
    0xffff8006e78c2848 : 0xfffff8013d240000 : storport!StorPortExtendedFunction+0x9b0
    0xffff8006e78c2868 : 0xfffff8013d2fb47e : stornvme!IsInternalSrb+0x16
    0xffff8006e78c2880 : 0xffff8006e78c2898 : 0xfffff8013d3018a3 : stornvme!NVMeRequestComplete+0x27
    0xffff8006e78c2898 : 0xfffff8013d3018a3 : stornvme!NVMeRequestComplete+0x27
    0xffff8006e78c28a8 : 0xfffff8013af25440 : nt!ExNode0
    0xffff8006e78c28e8 : 0xfffff8013d2fc33d : stornvme!ControllerReset+0x1a1
    0xffff8006e78c2918 : 0xfffff8013d312760 :  !du ""Controller Reset failed due to surprise remove""
    0xffff8006e78c2968 : 0xfffff8013d2ff55a : stornvme!NVMeControllerReset+0x10a
    0xffff8006e78c2998 : 0xfffff8013d2fe4af : stornvme!NVMeControllerAsyncResetWorker+0x3f
    0xffff8006e78c29c8 : 0xfffff8013d26a1f6 : storport!StorPortWorkItemRoutine+0x46
    0xffff8006e78c29f0 : 0xfffff8013d26a1b0 : storport!StorPortWorkItemRoutine
    0xffff8006e78c29f8 : 0xfffff8013a443f85 : nt!IopProcessWorkItem+0x135
    0xffff8006e78c2a18 : 0xfffff80151695440 : afd!AfdDoWork
    0xffff8006e78c2a28 : 0xfffff8013a443e50 : nt!IopProcessWorkItem
    0xffff8006e78c2a68 : 0xfffff8013a48e5c5 : nt!ExpWorkerThread+0x105
    0xffff8006e78c2a80 : 0xfffff8013a443e50 : nt!IopProcessWorkItem
    0xffff8006e78c2a98 : 0xfffff8013a4df399 : nt!KiUpdateSpeculationControl+0x49
    0xffff8006e78c2aa8 : 0xffff898c156833f0 : 0xffff898c156cbb20 : 0xfffff8013ae50c00 : nt!MiSystemPartition
    0xffff8006e78c2af0 : 0xfffff8013a48e4c0 : nt!ExpWorkerThread
    0xffff8006e78c2b08 : 0xfffff8013a5265f5 : nt!PspSystemThreadStartup+0x55
    0xffff8006e78c2b58 : 0xfffff8013a6048d8 : nt!KiStartSystemThread+0x28
    0xffff8006e78c2b70 : 0xfffff8013a5265a0 : nt!PspSystemThreadStartup
    The difference that the 0x10 argument 1 exception makes is that in this case argument 4 contains the device driver error source. This tuns out to be the address of the device node (rather than the device object, which is what it seemed to be). Displaying that device node gives..
    Code:
    9: kd> !devnode ffff898c1741b1a0
    DevNode 0xffff898c1741b1a0 for PDO 0xffff898c174790a0
      Parent 0xffff898c173f47c0   Sibling 0000000000   Child 0xffff898c1741b050   
      InterfaceType 0  Bus Number 0
      InstancePath is "\Device\RaidPort1"
      ServiceName is ""
      TargetDeviceNotify List - f 0x100ff0002  b 0xfffff8013d2f5ed0
      State = Unknown State (0x0)
      Previous State = Unknown State (0x1569f870)
      StateHistory[11] = Unknown State (0x80000)
      StateHistory[10] = Unknown State (0x1)
      StateHistory[09] = Unknown State (0xffffffed)
      StateHistory[08] = Unknown State (0x0)
      StateHistory[07] = Unknown State (0x5)
      StateHistory[06] = Unknown State (0x3)
      StateHistory[05] = Unknown State (0xe0)
      StateHistory[04] = Unknown State (0xffff898c)
      StateHistory[03] = Unknown State (0x1741b1a0)
      StateHistory[02] = Unknown State (0xffff898c)
      StateHistory[01] = Unknown State (0x156a0bf0)
      StateHistory[00] = Unknown State (0xffff898c)
      StateHistory[19] = Unknown State (0x1749d6a0)
      StateHistory[18] = Unknown State (0x2)
      StateHistory[17] = Unknown State (0x3)
      StateHistory[16] = Unknown State (0x0)
      StateHistory[15] = Unknown State (0x0)
      StateHistory[14] = Unknown State (0xffffffff)
      StateHistory[13] = Unknown State (0xffffffff)
      StateHistory[12] = Unknown State (0x201)
      Flags (0000000000)  
      UserFlags (0xffffff01)  DNUF_WILL_BE_REMOVED
                              Unknown flags 0xffffff00
      CapabilityFlags (0x17419010)  Removable, WakeFromD2, 
                                    NonDynamic, WarmEjectSupported
                                    Unknown flags 0x17400000
      DisableableDepends = 65792 (from children)
    This device node is for \Device\RaidPort1 then, and RAID is not something you're using? Do you have any RAID options set in the BIOS? Is the Intel Rapid Storage Technology (IRST) driver installed? To check, open Device Manager, expand the section Storage Controllers and if you have an entry called Intelģ Chipset SATA/PCIe RST Premium Controller, then Intel RST is installed. AFAIK you only need Intel RST for RAID support and Optane memory support.
    Last edited by ubuysa; 24 May 2023 at 04:23.
      My Computer


  3. Posts : 95
    Windows 10
    Thread Starter
       #143

    I donít have anything RAID set up. My SATA controller in bios is set to enabled by default, but I havenít set any RAID volume or anything. The SATA mode selection is set to AHCI and not Intel RST. Do I need to change that?
      My Computer


  4. Posts : 402
    Windows 10 and Windows 11
       #144

    No, that sounds all good to me. It's probably just an internal Windows device.

    - - - Updated - - -

    I've done some more research this morning, based on that \Device\RaidPort1 device mentioned in the recent dump...

    It seems that when an I/O request is sent to the miniport driver for a storage device (like your NVMe drive) Windows puts it in a pending state, along with other requests for the same device. When the I/O request is completed it is removed from this queue. Whilst on the queue however, they are timed and the timer is reset as each request reaches the head of the queue. If this timer expires then Windows assumes the device has stopped responding. According to the documentation I'm reading, if this timer pops then an error 129 record is written to the event log. There is no mention of it causing a BSOD, though if the I/O in question was a critical system I/O it could cause a BSOD I suppose?

    What is particularly interesting is that this timer is user tunable, to match the characteristics of the storage device(s). The setting is in the registry key HKLM\System\CurrentControlSet\Services\Disk\TimeoutValue and it can take any integer value from 0 to 255 which is interpreted in seconds. If the key doesn't exist then it's default is 10 seconds - which to be honest is huge on a modern system. The key is documented here, along with other tunable parameters, which I suggest you don't mess with!

    I was wondering whether you fancied increasing this timeout value to see what effect it has (if any)? Take a backup of the registry key first of course, and then experiment.
      My Computer


  5. Posts : 95
    Windows 10
    Thread Starter
       #145

    Interesting.. I installed the new PSU cables yesterday and have not had a BSOD thus far, AND I tested that same Benchmark test that had been crashing my PC 100% of the time, and it went through all the tests with no issues. If it ends up BSODing, then Iíll try your suggestion. If then it ends up BSODing again, a colorful barbecue will be underway
      My Computer


  6. Posts : 402
    Windows 10 and Windows 11
       #146

    Out of interest what is the current value of your HKLM\System\CurrentControlSet\Services\Disk\TimeoutValue entry? On my Windows 10 22H2 system it's 0x41 (that's 65 decimal, and a lot longer than the default). I've not changed it, that's how it was set at install time.

    Fingers crossed on the cables.....

    - - - Updated - - -

    You won't believe this, but on a different forum I have just come across another user with exactly the same NVMe problem as you. Identical. In every respect.

    I'll keep you posted.
      My Computer


  7. Posts : 95
    Windows 10
    Thread Starter
       #147

    Send the link over! Would love to keep an eye on it as well. My room with my computer is getting painted right now, but I should be able to update you late tomorrow night on that value youíre interested in
      My Computer


  8. Posts : 402
    Windows 10 and Windows 11
       #148

    The other user's system is a mess, he has a whole host of out of date drivers installed. I'm sure that's not the case with your system?

    Question - Random WHEA BSOD on a 80UR Ideapad 700-15isk | Tom's Hardware Forum
      My Computer


  9. Posts : 95
    Windows 10
    Thread Starter
       #149

    No Iíve really never had anything out of date, including bios and nvidia drivers
      My Computer


  10. Posts : 402
    Windows 10 and Windows 11
       #150

    The similarities are extraordinary. Here is your recent kernel dump...
    Code:
    WHEA_UNCORRECTABLE_ERROR (124)
    A fatal hardware error has occurred. Parameter 1 identifies the type of error
    source that reported the error. Parameter 2 holds the address of the
    WHEA_ERROR_RECORD structure that describes the error conditon. Try !errrec Address of the WHEA_ERROR_RECORD structure to get more details.
    Arguments:
    Arg1: 0000000000000010, Error Source Type
    Arg2: ffff898c326f3028
    Arg3: ffff898c173b3aac
    Arg4: ffff898c1741b1a0
    
    .......
    
    9: kd> knL
     # Child-SP          RetAddr               Call Site
    00 ffff8006`e78c2388 fffff801`3a7b9abc     nt!KeBugCheckEx
    01 ffff8006`e78c2390 fffff801`3a7ba619     nt!WheaReportHwError+0x3ec
    02 ffff8006`e78c2470 fffff801`3a7ba735     nt!WheaHwErrorReportSubmitDeviceDriver+0xe9
    03 ffff8006`e78c24a0 fffff801`3d273721     nt!WheaReportFatalHwErrorDeviceDriverEx+0xf5
    04 ffff8006`e78c2500 fffff801`3d26cb00     storport!StorpWheaReportError+0x9d
    05 ffff8006`e78c2590 fffff801`3d23f07c     storport!StorpMarkDeviceFailed+0x358
    06 ffff8006`e78c2820 fffff801`3d2fc33d     storport!StorPortNotification+0x91c
    07 ffff8006`e78c28f0 fffff801`3d2ff55a     stornvme!ControllerReset+0x1a1
    08 ffff8006`e78c2970 fffff801`3d2fe4af     stornvme!NVMeControllerReset+0x10a
    09 ffff8006`e78c29a0 fffff801`3d26a1f6     stornvme!NVMeControllerAsyncResetWorker+0x3f
    0a ffff8006`e78c29d0 fffff801`3a443f85     storport!StorPortWorkItemRoutine+0x46
    0b ffff8006`e78c2a00 fffff801`3a48e5c5     nt!IopProcessWorkItem+0x135
    0c ffff8006`e78c2a70 fffff801`3a5265f5     nt!ExpWorkerThread+0x105
    0d ffff8006`e78c2b10 fffff801`3a6048d8     nt!PspSystemThreadStartup+0x55
    0e ffff8006`e78c2b60 00000000`00000000     nt!KiStartSystemThread+0x28
    And here is the other user's dump...
    Code:
    WHEA_UNCORRECTABLE_ERROR (124)
    A fatal hardware error has occurred. Parameter 1 identifies the type of error
    source that reported the error. Parameter 2 holds the address of the
    WHEA_ERROR_RECORD structure that describes the error conditon. Try !errrec Address of the WHEA_ERROR_RECORD structure to get more details.
    Arguments:
    Arg1: 0000000000000010, Error Source Type
    Arg2: ffffe18da36e2028
    Arg3: ffffe18d97452aac
    Arg4: ffffe18d974651a0
    
    ......
    
    5: kd> knL
     # Child-SP          RetAddr               Call Site
    00 ffffca8f`4dab8388 fffff803`10fb583c     nt!KeBugCheckEx
    01 ffffca8f`4dab8390 fffff803`10fb6399     nt!WheaReportHwError+0x3ec
    02 ffffca8f`4dab8470 fffff803`10fb64b5     nt!WheaHwErrorReportSubmitDeviceDriver+0xe9
    03 ffffca8f`4dab84a0 fffff803`15ea2035     nt!WheaReportFatalHwErrorDeviceDriverEx+0xf5
    04 ffffca8f`4dab8500 fffff803`15e9b4c0     storport!StorpWheaReportError+0x9d
    05 ffffca8f`4dab8590 fffff803`15e81c02     storport!StorpMarkDeviceFailed+0x358
    06 ffffca8f`4dab8820 fffff803`15e3a00d     storport!StorPortNotification+0x149d2
    07 ffffca8f`4dab88f0 fffff803`15e3d192     stornvme!ControllerReset+0x1a1
    08 ffffca8f`4dab8970 fffff803`15e3c10f     stornvme!NVMeControllerReset+0x10a
    09 ffffca8f`4dab89a0 fffff803`15e98c11     stornvme!NVMeControllerAsyncResetWorker+0x3f
    0a ffffca8f`4dab89d0 fffff803`10d5a4c5     storport!StorPortWorkItemRoutine+0x41
    0b ffffca8f`4dab8a00 fffff803`10c25975     nt!IopProcessWorkItem+0x135
    0c ffffca8f`4dab8a70 fffff803`10d17e85     nt!ExpWorkerThread+0x105
    0d ffffca8f`4dab8b10 fffff803`10dfd498     nt!PspSystemThreadStartup+0x55
    0e ffffca8f`4dab8b60 00000000`00000000     nt!KiStartSystemThread+0x28
    Can you let me know the value that you have in the registry key HKLM\System\CurrentControlSet\Services\Disk\TimeoutValue?
      My Computer


 

  Related Discussions
Our Sites
Site Links
About Us
Windows 10 Forums is an independent web site and has not been authorized, sponsored, or otherwise approved by Microsoft Corporation. "Windows 10" and related materials are trademarks of Microsoft Corp.

© Designer Media Ltd
All times are GMT -5. The time now is 13:12.
Find Us




Windows 10 Forums