SC14: International Conference for High Performance Computing, Networking, Storage and Analysis 2014
DOI: 10.1109/sc.2014.53
|View full text |Cite
|
Sign up to set email alerts
|

Understanding Soft Error Resiliency of Blue Gene/Q Compute Chip through Hardware Proton Irradiation and Software Fault Injection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 27 publications
(14 citation statements)
references
References 16 publications
0
14
0
Order By: Relevance
“…These breakdowns are usually defined as hard faults , they are in general reproducible and, if unaddressed, bring programs to halt. Soft faults are instead usually caused by fluctuations in radiation that introduce spurious modifications in the program data in the form of bit flips and are usually non-reproducible (Cher et al, 2014). Faults are further distinguished into detected and corrected, detected and uncorrectable, and undetected ones.…”
Section: Resilience Methodologiesmentioning
confidence: 99%
“…These breakdowns are usually defined as hard faults , they are in general reproducible and, if unaddressed, bring programs to halt. Soft faults are instead usually caused by fluctuations in radiation that introduce spurious modifications in the program data in the form of bit flips and are usually non-reproducible (Cher et al, 2014). Faults are further distinguished into detected and corrected, detected and uncorrectable, and undetected ones.…”
Section: Resilience Methodologiesmentioning
confidence: 99%
“…Cher et al [12] use both proton irradiation and SFI to study the soft error resilience of BlueGene/Q. Proton bombardment shows that BG/Q has a mean time between correctable errors of 1.5 days validating the need for detection mechanisms.…”
Section: Related Workmentioning
confidence: 99%
“…Also, a largescale study [28] of DRAM errors on Google datacenters was done some years ago, showing that error rates are higher than previously reported. Of particular relevance are these largescale studies of DRAM errors in production systems [21], [29] with a full characterization of hard and soft errors in production systems. These studies are important milestones given the scale of the analysis (i.e., millions of DIMM days, hundreds of Terabyte-years).…”
Section: Related Workmentioning
confidence: 99%