2005
DOI: 10.1109/tdmr.2005.855685
|View full text |Cite
|
Sign up to set email alerts
|

Predicting the number of fatal soft errors in Los Alamos national laboratory's ASC Q supercomputer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
59
0

Year Published

2008
2008
2024
2024

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 125 publications
(59 citation statements)
references
References 17 publications
0
59
0
Order By: Relevance
“…These soft errors do not cause permanent hardware damage, but may result in a complete system failure. For example, Sun Microsystems acknowledges that customers such as America Online, eBay and Los Alamos National Labs experienced system failures caused by transient faults [8].…”
Section: Introductionmentioning
confidence: 99%
“…These soft errors do not cause permanent hardware damage, but may result in a complete system failure. For example, Sun Microsystems acknowledges that customers such as America Online, eBay and Los Alamos National Labs experienced system failures caused by transient faults [8].…”
Section: Introductionmentioning
confidence: 99%
“…For example, the 100,000 node BlueGene/L scheme at Lawrence Livermore National Laboratory (LLNL) practices an L1 cache bit error every 8 hours [36] and a hard failure every 7-10 days. Exascale schemes are expected to fail every 3-26 minutes [37], [38].…”
Section: Related Workmentioning
confidence: 99%
“…In 2005, Hewlett Packard stated that on a 2048-node supercomputer in Los Alamos National Laboratory, a higher-than-expected number of single-node failures were observed and the primary cause of these failures were transient faults induced by cosmic ray strikes [32].…”
Section: Motivationmentioning
confidence: 99%