Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating System 2012
DOI: 10.1145/2150976.2150989
|View full text |Cite
|
Sign up to set email alerts
|

Cosmic rays don't strike twice

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
6
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 146 publications
(7 citation statements)
references
References 18 publications
0
6
0
Order By: Relevance
“…According to early work at IBM (Ziegler and Lanford, 1979), errors affecting memory devices can be divided into two basic groups: hard errors are those caused by a physical defect, while soft errors are transient in nature and caused by some kind of electromagnetic interaction, such as a cosmic ray strike. Considerable work has been carried out on understanding the causes and effects of cosmic rays on silicon devices (Ziegler andLanford, 1979, 1981;Ziegler, 1996;Ziegler et al, 1996), in particular on their effect on DRAM devices (McKee and McAdams, 1996;Borucki et al, 2008;Fang et al, 2009;Hwang et al, 2012).…”
Section: Introductionmentioning
confidence: 99%
“…According to early work at IBM (Ziegler and Lanford, 1979), errors affecting memory devices can be divided into two basic groups: hard errors are those caused by a physical defect, while soft errors are transient in nature and caused by some kind of electromagnetic interaction, such as a cosmic ray strike. Considerable work has been carried out on understanding the causes and effects of cosmic rays on silicon devices (Ziegler andLanford, 1979, 1981;Ziegler, 1996;Ziegler et al, 1996), in particular on their effect on DRAM devices (McKee and McAdams, 1996;Borucki et al, 2008;Fang et al, 2009;Hwang et al, 2012).…”
Section: Introductionmentioning
confidence: 99%
“…If every region of memory is equally likely to experience an uncorrectable error, we would expect to see relatively few errors in kernel memory because it typically occupies a much smaller memory footprint than the application. However, recent evidence suggests that kernel memory may be more prone to memory errors than other regions of memory (Hwang et al, 2012).…”
Section: Introductionmentioning
confidence: 99%
“… 1. This might happen if, for example, the MCE was raised by a memory scrubber. However, it is not clear that this is a common scenario (Hwang et al, 2012). …”
mentioning
confidence: 99%
“…as well as software (operating system, runtime, unscheduled maintenance interruption). In fact, recent work indicates that (i) servers tend to crash twice a year (2-4% failure rate) [1], (ii) 1-5% of disk drives die per year [2], (iii) DRAM errors occur in 2% of all DIMMs per year [1], which is more frequent than commonly believed, and (iv) large scale studies indicate that simple ECC mechanisms alone are not capable of correcting a significant number of DRAM errors [3]. Even for small systems, such causes result in fairly low mean-time-between-failures/interrupts (MTBF/I) as depicted in Figure I [4], and the 6.9 hours estimated by Livermore National Lab for its BlueGene confirms this.…”
Section: Introductionmentioning
confidence: 99%