2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks 2014
DOI: 10.1109/dsn.2014.48
|View full text |Cite
|
Sign up to set email alerts
|

A-ABFT: Autonomous Algorithm-Based Fault Tolerance for Matrix Multiplications on Graphics Processing Units

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(15 citation statements)
references
References 26 publications
0
9
0
Order By: Relevance
“…Braun et al [4] used a simple fault injector (for matrix multiplication) that was able to inject faults into a target SM and one of its functional units. Ours can also target a specific SM and SP, but has the added capability of allowing for variable fault start time and active duration.…”
Section: Related Workmentioning
confidence: 99%
“…Braun et al [4] used a simple fault injector (for matrix multiplication) that was able to inject faults into a target SM and one of its functional units. Ours can also target a specific SM and SP, but has the added capability of allowing for variable fault start time and active duration.…”
Section: Related Workmentioning
confidence: 99%
“…Similar to other works [15], [16], Our approach uses the reciprocal distribution of mantissa bits to calculate the rounding error bound for a block checksum. The principle of the method is based on the fact that matrix multiplication consists of multiple steps of multiplication and addition, and the rounding error bound can be obtained by calculating expectation and variance during those steps.…”
Section: B Block Size and Rounding Errormentioning
confidence: 99%
“…A simplified error analysis (SEA) approach for ABFT is introduced in [23]. A-ABFT calculates the range of rounding errors through the probability distribution of floating-point tails [15].…”
Section: Related Workmentioning
confidence: 99%
“…It imposes a low overhead on the application and guarantees a good SDC detection recall in general. Some recent studies [8] have shown that ABFT can be implemented for matrix multiplications on hardware accelerators. In addition to its detection capability, ABFT offers correction features.…”
Section: Algorithm-based Fault Tolerancementioning
confidence: 99%