2011
DOI: 10.1007/s10766-011-0183-4
|View full text |Cite
|
Sign up to set email alerts
|

DAFT: Decoupled Acyclic Fault Tolerance

Abstract: Higher transistor counts, lower voltage levels, and reduced noise margin increase the susceptibility of multicore processors to transient faults. Redundant hardware modules can detect such errors, but software transient fault detection techniques are more appealing for their low cost and flexibility. Recent software proposals double register pressure or memory usage, or are too slow in the absence of hardware extensions, preventing widespread acceptance. This paper presents DAFT, a fast, safe, and memory effic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
54
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 46 publications
(58 citation statements)
references
References 34 publications
1
54
0
Order By: Relevance
“…Each run is categorized as correct output, fail silent, access fault (invalid memory access), arithmetic error (e.g., divide by zero), or invalid instruction (in the current evaluation, the latter two categories were not applicable to the mentioned kernels or the fault-injection configuration used). This is similar to the approach used by Zhang et al [15]. Figure 5 illustrates the results of the evaluation.…”
Section: Fault Coveragementioning
confidence: 63%
See 1 more Smart Citation
“…Each run is categorized as correct output, fail silent, access fault (invalid memory access), arithmetic error (e.g., divide by zero), or invalid instruction (in the current evaluation, the latter two categories were not applicable to the mentioned kernels or the fault-injection configuration used). This is similar to the approach used by Zhang et al [15]. Figure 5 illustrates the results of the evaluation.…”
Section: Fault Coveragementioning
confidence: 63%
“…Given that not all Single Event Upset (SEU) events [15], [24] lead to a fault, we are currently looking into minimizing the number of redundant computations by taking into account the failure probability of the operation and its sensitivity to input changes. Finally, introducing redundancy at the highlevel source may have the drawback that preserving the redundant computation may require that compiler optimizations be disabled or made more complicated, which could seriously hamper the introduction of resiliency transformations.…”
Section: Discussionmentioning
confidence: 99%
“…This is why the sphere of replication(SoR) in CASTED is limited within the processor only. This is common practice in the majority of software-based error detection methodologies [9][23][34] [36].…”
Section: B Sphere Of Replicationmentioning
confidence: 99%
“…Since they can be easily caught by a custom exception handler, they are usually part of the detected errors (as in [36]). In our case however, we show them as a separate type of errors for clarity.…”
Section: Fault Coverage Evaluationmentioning
confidence: 99%
See 1 more Smart Citation