International Symposium on Code Generation and Optimization
DOI: 10.1109/cgo.2005.34
|View full text |Cite
|
Sign up to set email alerts
|

SWIFT: Software Implemented Fault Tolerance

Abstract: To improve performance and reduce power, processor designers employ advances that shrink feature sizes, lower voltage levels, reduce noise margins, and increase clock rates. However, these advances make processors more susceptible to transient faults that can affect correctness. While reliable systems typically employ hardware techniques to address soft-errors, software techniques can provide a lower-cost and more flexible alternative. This paper presents a novel, software-only, transient-fault-detection techn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
424
0

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 568 publications
(427 citation statements)
references
References 29 publications
3
424
0
Order By: Relevance
“…Software countermeasures can be designed at different levels, such as at an algorithmic level [6], in a high-level programming language [7][8][9] or at assembly level [10][11][12]. While higher level countermeasures may be optimized away or altered by a compiler, low-level countermeasures are compatible with existing compilers and toolchains.…”
Section: Many Countermeasuresmentioning
confidence: 99%
See 1 more Smart Citation
“…Software countermeasures can be designed at different levels, such as at an algorithmic level [6], in a high-level programming language [7][8][9] or at assembly level [10][11][12]. While higher level countermeasures may be optimized away or altered by a compiler, low-level countermeasures are compatible with existing compilers and toolchains.…”
Section: Many Countermeasuresmentioning
confidence: 99%
“…Software countermeasures are often based on temporal redundancy (i.e. performing the same computation multiple times) to detect or tolerate errors during computations [10,12,8,11]. Control flow protection requires different mechanisms to detect a modification of the execution flow [7,14].…”
Section: Related Workmentioning
confidence: 99%
“…Extensions to the EDDI have been proposed [7] that achieve better efficiency by assuming reliable caches and memory, but still require redundant registers and instructions. Their experiments showed an average normalized execution time of 1.41, but without protection for system memory.…”
Section: Related Workmentioning
confidence: 99%
“…Such an assumption could be reasonable if malicious failures are unlikely and each replica maintain a checksum of its history, reporting a failure when it detects that its history is compromised. Alternatively, automated approaches that transform hardware errors into crash failures could be used [5,6]. There is no way to prevent malicious replicas from issuing truncated histories.…”
Section: Safetymentioning
confidence: 99%
“…Various studies in complex systems have shown that crash failures constitute a minority of failures [2,3], while trends in hardware increase the probability of transient hardware errors such as bit flips [4][5][6]. Worse yet, most replication protocols deployed in cloud centers provide weak consistency guarantees, meaning that they introduce inconsistencies even if there are no faults [7,8].…”
Section: Introductionmentioning
confidence: 99%