DRAM isn't nearly as reliable as vendors would like you to think. Now researchers have shown that bit flips can be induced maliciously by simple user-level programs. Will vendors fix the problem?
As documented in DRAM error rates: Nightmare on DIMM street, DRAM error rates are hundreds to thousands of times higher than thought -- a mean of 3,751 correctable errors per DIMM per year. Which assumes your DIMM has error correcting code (ECC) to correct those errors. If not:
Everything is fine until the data corruption means a missed memory reference or an incorrect value or a flipped bit in a file writing to disk. What you see is a "file not found" or a "file not readable" message or, worse yet, silent data corruption - or even a system crash. And nothing that says "memory error."
In Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors, researchers Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Konrad Lai Onur Mutlu - all of CMU - and Chris Wilkerson of Intel Labs, found that commodity DRAM chips are vulnerable to disturbance errors. Moore's Law has reduced cell sizes and made them more susceptible to adjacent current flows.
By reading from the same address in DRAM, we show that it is possible to corrupt data in nearby addresses. More specifically, activating the same row in DRAM corrupts data in nearby rows. We demonstrate this phenomenon on Intel and AMD systems using a malicious program that generates many DRAM accesses. We induce errors in most DRAM modules (110 out of 129) from three major DRAM manufacturers. From this we conclude that many deployed systems are likely to be at risk.
The root cause of the errors: rapid voltage fluctuations on the wordline of a row of memory cells. The wordline voltage is raised in order to read bits in the row of cells.
A program that issues as few as 139,000 reads to a specific wordline can induce an error. As many as 1 in every 1700 cells is susceptible to such errors.