2009 International Symposium on Code Generation and Optimization 2009
DOI: 10.1109/cgo.2009.14
|View full text |Cite
|
Sign up to set email alerts
|

ESoftCheck: Removal of Non-vital Checks for Fault Tolerance

Abstract: Abstract-As semiconductor technology scales into the deep submicron regime the occurrence of transient or soft errors will increase. This will require new approaches to error detection. Software checking approaches are attractive because they require little hardware modification and can be easily adjusted to fit different reliability and performance requirements. Unfortunately, software checking adds a significant performance overhead.In this paper we present ESoftCheck, a set of compiler optimization techniqu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
10
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 27 publications
(10 citation statements)
references
References 35 publications
0
10
0
Order By: Relevance
“…Similarly, transactional semantics [24] can be used for MPI, but not on application variables that may be corrupted. Esoftcheck [46], uses compiler analysis to remove redundant SDC detectors to maintain high reliability, but does not consider the latency of detection and how it effects propagation. An analytic version of this problem which investigates optimal placement of detectors of different capabilities to verify a checkpoint is corruption free is presented in [4], but considers a fixed recovery time that does not change based on how much state is corrupted.…”
Section: Related Workmentioning
confidence: 99%
“…Similarly, transactional semantics [24] can be used for MPI, but not on application variables that may be corrupted. Esoftcheck [46], uses compiler analysis to remove redundant SDC detectors to maintain high reliability, but does not consider the latency of detection and how it effects propagation. An analytic version of this problem which investigates optimal placement of detectors of different capabilities to verify a checkpoint is corruption free is presented in [4], but considers a fixed recovery time that does not change based on how much state is corrupted.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, redundancy can be used in order to reduce the probability of ‘bad errors’. Critical computations can be executed twice (and the redundancy can be introduced automatically by a compiler; see Reis et al, 2005b; Yu et al, 2009); more reliable memory may be used for more sensitive data, and so forth.…”
Section: Possible Scenariosmentioning
confidence: 99%
“…Soft errors can be handled by dual modular redundancy (DMR). DMR approaches, typically assisted by compilers, duplicate computing instructions and insert check instructions into the original programs [14,40,41,48,68]. DMR is very general and can be applied to any application, but it introduces high overhead especially for computing-bound applications because it duplicates all computations.…”
Section: Introductionmentioning
confidence: 99%