Proceedings of the International Symposium on Memory Systems 2019
DOI: 10.1145/3357526.3357558
|View full text |Cite
|
Sign up to set email alerts
|

DRAM errors in the field

Abstract: This paper summarizes our two-year study of corrected and uncorrected errors on the MareNostrum 3 supercomputer, covering 2000 billion MB-hours of DRAM in the field. The study analyzes 4.5 million corrected and 71 uncorrected DRAM errors and it compares the reliability of DIMMs from all three major memory manufacturers, built in three different technologies. Our work has two sets of contributions. First, we illustrate the complexity of in-field DRAM error analysis and demonstrate the limitations of various wid… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(1 citation statement)
references
References 15 publications
(33 reference statements)
0
1
0
Order By: Relevance
“…In an attempt to reduce the number of main memory devices in a system, DRAM manufacturers have increased the density of the memory cells within the same device; an approach that has strong consequences: 1.1 Current challenges for Memory Systems in HPC 3 uncorrected error rates in DRAM devices increase when manufacturing technology scales down [150]. Researchers have analyzed the impact of the miniaturization process finding serious implications for data integrity.…”
Section: Memory Reliability and Securitymentioning
confidence: 99%
“…In an attempt to reduce the number of main memory devices in a system, DRAM manufacturers have increased the density of the memory cells within the same device; an approach that has strong consequences: 1.1 Current challenges for Memory Systems in HPC 3 uncorrected error rates in DRAM devices increase when manufacturing technology scales down [150]. Researchers have analyzed the impact of the miniaturization process finding serious implications for data integrity.…”
Section: Memory Reliability and Securitymentioning
confidence: 99%