Misbehaving 2080 ti (Nvlddmkm.sys)


  1. Posts : 4
    Windows 10
       #1

    Misbehaving 2080 ti (Nvlddmkm.sys)


    Greetings, I'll try to provide as much info as I can on the issue I'm facing so please bear with me.

    A couple months ago, my 2080 ti that has been working flawlessly in this machine for 4+ years started locking up my entire system in games to the point where I had to hard reset using the button on the case. (System became unresponsive, monitor was stuck on the last rendered frame.) OCCT 3d adaptive tests resulted in a system hang within 5 minutes, and the power tests froze the system pretty much instantly.
    I've made sure that all temperatures (core, hotspot, memory) are within safe operating range. I've also made sure that the PSU rails are within safe operating range under load. (+-5%). I've tried multiple driver versions (using DDU in safe mode), undervolting & underclocking. At one point, I had thought that a new driver fixed the issue as the gpu started working properly again for about 4 weeks. Then one day, without me making any changes to hardware & software, It started crashing again. This time, the lock ups were far more frequent than during the first episode of issues.

    All LiveKernelReports & Minidumps point to an issue with Nvlddmkm.sys

    I understand that while this clearly indicates an issue with the driver, there could be other factors influencing this such as bad ram sticks, xmp profiles, cpu overclocks, the card itself, psu, etc.

    Out of frustration, I gave up on troubleshooting and put an old gtx 960 in the system. It's been rock solid stable for about 3 months. Not a single lock up, no BSOD's, nothing.

    Please correct me if I'm wrong, but in my mind, this eliminates the possibility of the issues being caused by ram, cpu settings, bad pcie slot, etc.

    Today, I took the 2080 ti over to a friends house, put it in his machine, fired it up and the OCCT tests passed with no issues. We also ran furmark - no issues.

    At this point, I think that the PSU is the main suspect even though the voltages are rock solid and the gtx 960 runs absolutely fine on it. (Though it's far less power hungry than the 2080ti) It's a Seasonic Prime Ultra Gold 850w that's been in use for about 4 years and has a 12 year warranty.

    System specs:

    Aorus z390 master F7 bios (older bios version, but has been working fine with the 2080 ti for years, and as of now, works fine with the gtx 960)

    I9 9900k @4.7ghz, turbo boost off, multi core enhancement off, undervolted. (has been working flawlessly for 5 years and still does with the gtx 960)

    32gb DDR4 Kingston Hyperx Predator 2666mhz xmp profile (again, works fine with the gtx 960 and the sticks are 100% fully compatible with the motherboard)

    Windows 10 pro 22h2 fully up to date

    Am I missing something here, or should I try a different PSU at this point? software conflict has also crossed my mind. I've had this installation of windows pretty much since I've built the system in late 2019, but I'm taking really good care of it. Everything is up to date, there's absolutely no malware/viruses and whenever I update drivers, I make sure not to leave any residual files behind. But again - If it was a software conflict, how come the gtx 960 works absolutely fine?

    TLDR:
    RTX 2080 ti is unstable in my system.
    GTX 960 runs fine in my system.
    The same RTX 2080 ti runs fine in a different system.

    Many thanks in advance to anyone that's willing to dedicate a couple minutes of their time for feedback.
    Last edited by 9489; 4 Weeks Ago at 14:59.
      My Computer


  2. Posts : 24,026
    Win 10 Home ♦♦♦19045.4842 (x64) [22H2]
       #2

    Hello @9489 and welcome to Ten Forums.

    Try this...

    1. Enable or Disable Driver Updates in Windows Update in Windows 10

    2. Then use DDU again, and then reinstall the vid card driver, like so...


    DDU Instructions - Nvidia
    1. Get this program, here: Display Driver Uninstaller (DDU) | Wagnardsoft ...get the latest version, and save it to your desktop.
    2. Get the DCH/Game Ready vid card driver here, use the Manual Search: Advanced Driver Search | NVIDIA ...save this to your desktop.
    Unhook the internet completely.
    3. Reboot into Safe Mode and run DDU (Display Driver Uninstaller) choose the "Highly Recommend Option", and just do what it tells you.
    4. After it's done, reboot to normal mode, then just double click the Nvidia driver to install.
    [See pics below]. If it want's to reboot, let it.
    Rehook the internet.




    Last edited by Ghot; 4 Weeks Ago at 01:18.
      My Computer


  3. Posts : 2,568
    Windows 10 Pro 64bit
       #3

    +1 for DDU.
      My Computer


  4. Posts : 1,484
    Windows 10
       #4

    They already mention using DDU, studio driver are better imo because they are stable drivers. In this case i would be opting for it.

    9489 said:
    I've tried multiple driver versions (using DDU in safe mode),
    You tested the PSU and it came back fine so i would not doubt that either all though you could RMA it if you want because you have the warranty.

    GPU driver BSOD can be caused from the stupidest of things. Did you read the dump file in a debugger? sometimes they give easy clues and other times they don't and sometimes they are straight up non descriptive in those cases you may need someone else to look at it, that is good at making conclusions from limited information.

    Out of frustration, I gave up on troubleshooting and put an old gtx 960 in the system. It's been rock solid stable for about 3 months. Not a single lock up, no BSOD's, nothing.

    Please correct me if I'm wrong, but in my mind, this eliminates the possibility of the issues being caused by ram, cpu settings, bad pcie slot, etc.
    Yes i would think so, the issue lies with the RTX card/ driver and what ever else it is conflicting with. 960 is GTX not RTX so by using that card it showed you a very clear divide because its two totally different drivers.
      My Computer


  5. Posts : 4
    Windows 10
    Thread Starter
       #5

    Malneb said:
    You tested the PSU and it came back fine so i would not doubt that either all though you could RMA it if you want because you have the warranty.
    I should've mentioned that I only tested the rails through software. Those readings could be inaccurate, but they're nowhere near dipping below or shooting above the 5% limit. I've also learned that the older (pre 2018) Seasonic Prime units had issues with handling transient spikes resulting in tripping the OCP and shut downs. That's not what I'm experiencing, but I can't help but wonder if this PSU was from the bad batch and could be causing hard freezes due to wear and tear? I couldn't find the year it was made anywhere.

    The model is: SSR-850GD2
    Manufacturers code: 1GD285FRT3A13W

    Malneb said:
    GPU driver BSOD can be caused from the stupidest of things. Did you read the dump file in a debugger? sometimes they give easy clues and other times they don't and sometimes they are straight up non descriptive in those cases you may need someone else to look at it, that is good at making conclusions from limited information.
    I did look at the dumps in WinDbg, but the only thing I understand is that there's an issue with Nvlddmkm.sys for whatever reason. I'll attach the dump files for you to look at: Upload Files | Free File Upload and Transfer Up To 20 GB (couldn't upload the zip file here on the forums for some reason)

    - - - Updated - - -

    Update:

    The 2080 ti is back in my system. The last time I've used it (3 months ago), it didn't let me play battlefield for longer than 5 minutes before hard freezing the entire system. Since then, nothing in my system has changed apart from some windows updates and the latest gpu driver. Today, the gpu has been working fine for about 2 hours of intensive use. It appears as if the driver fixed it, but I've been there before - it lasted a couple weeks. This is really strange behavior, and I'm assuming that if it was a hardware issue, the symptoms would be consistent and easily reproducible?
    Last edited by 9489; 4 Weeks Ago at 06:23.
      My Computer


  6. Posts : 1,484
    Windows 10
       #6

    I am not very good with dump files either but there is a couple of ppl here that know how to read them properly.

    Memory corruption, i think it was trying access and writing to memory that was not free. The other two dumps will be a result of the first one because the watchdog picked up that something was wrong. past that i am not really sure. The reason why its faulting will not be clear. The driver is always a pain it shits the bed on many reason, most of the time is for BS reasons.
      My Computer


  7. Posts : 2,788
    Windows 10
       #7

    The RTX 2080 Ti has TDP of 250w. GTX 960 TDP is 120w.

    You have a good quality PSU, so doubt if any problem there.

    There are a couple of things primarily concerned with electronic engineering.

    1. The RTX 2080 Ti has a TDP of 250w therefore cable(s) would have to be rated for the power, good quality, and installed properly.

    A voltmeter does not show things like ripple, noise, transients, contact oxidation, and so on, all of which increase with load.

    So I would check on ratings of cables, what is required by the Card specs, and all the connectors concerned inspected as regards damage and oxidation.
    Anything in dubious condition or underrated should be replaced.
      My Computer


  8. Posts : 4
    Windows 10
    Thread Starter
       #8

    Helmut said:
    1. The RTX 2080 Ti has a TDP of 250w therefore cable(s) would have to be rated for the power, good quality, and installed properly.
    I'm using the cables that came with the PSU. On the GPU, there are 2x8 pin and 1x6 pin power connectors. I have two individual PCIe cables supplying each of the two 8 pin connectors. One of these two cables is also supplying the additional 6 pin connector, which is only there for overclocking and/or using GPU vbioses with higher power limits. As far as I know, this is the correct configuration and it has been working fine for years prior to this problem appearing.

    Helmut said:
    So I would check on ratings of cables, what is required by the Card specs, and all the connectors concerned inspected as regards damage and oxidation.
    Anything in dubious condition or underrated should be replaced.
    I did inspect the connectors, and they look as good as new. As far as visuals, I have no reason to doubt the cables.

    I ran the OCCT power test earlier today, it pulls 500w on CPU & GPU combined. It ran fine.
    The last time I ran this test in the initial stage of troubleshooting months ago, it resulted in system freezes, restarts and/or BSODs pretty much instantly.

    I really can't wrap my head around this. Why does the system sometimes work and sometimes it doesn't?

    I have a really hard time believing that a new driver fixed it, because I've been there before and the issues always eventually came back. It took anywhere between a couple days to a couple weeks after clean driver installation for the system freezes to start reoccuring.
      My Computer


  9. Posts : 1,484
    Windows 10
       #9

    9489 said:
    I have a really hard time believing that a new driver fixed it
    It really could be that simple though, BSOD are hard to pinpoint unless you are really good at determining the dumps or at least are good at testing things in a methodical way. Driver are often the reason and when a new update comes sometimes its to fix bugs that could be causing issues like BSOD.

    The whole NVIDIA driver stack is a bitch its always crashing for stupid reasons, well at least i have noticed from my happenings. Much of the time when you look at a BSOD that someone uploads, something to do with NVIDIA's stack is in it.
    I am not that advanced to read them fully and understand everything because i am not that good but i know some of the basics, its something i am teaching myself slowly.

    I do have several AMD cards and i have not witnessed them crash much if at all, all though i have not used AMD cards anywhere near as much as NVIDIA ones as i have always like NVIDIA more even though they have for what it seems had for a long time a bad driver stack. idk if this is compartmental from other parts of the computer or if its actually Nvidia idk, all i know is that their stack comes up often and its not self isolated often you see it when looking at dumps from other people.
      My Computer


  10. Posts : 4
    Windows 10
    Thread Starter
       #10

    Malneb said:
    The whole NVIDIA driver stack is a bitch its always crashing for stupid reasons, well at least i have noticed from my happenings. Much of the time when you look at a BSOD that someone uploads, something to do with NVIDIA's stack is in it.
    I'll agree that the drivers are sometimes problematic, but in my 15+ years of building custom systems for myself, I've never had them cause this much trouble. In the past, I ran into the usual TDR issues where it would just reset the driver and the system kept working afterwards. This is the first time I'm seeing TDR failures resulting in completely freezing the system.

    Also, I've only had one real BSOD with this card as per the minidump. I suspect it was caused by the driver getting corrupted from the reoccurring TDR failures & me hard resetting the system, which is the main problem.

    What I really don't understand is that I can sometimes play a game for several hours, and sometimes it'll just freeze within the first 5-20 minutes of gameplay.

    Guess I'll just have to cross my fingers and hope it doesn't come back. If it does, I really don't know what steps to take next.

    Again, thank you to everyone for your inputs.
      My Computer


 

  Related Discussions
Our Sites
Site Links
About Us
Windows 10 Forums is an independent web site and has not been authorized, sponsored, or otherwise approved by Microsoft Corporation. "Windows 10" and related materials are trademarks of Microsoft Corp.

© Designer Media Ltd
All times are GMT -5. The time now is 15:52.
Find Us




Windows 10 Forums