2021
DOI: 10.1007/s11227-021-03892-4
|View full text |Cite
|
Sign up to set email alerts
|

Efficient detection of silent data corruption in HPC applications with synchronization-free message verification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 32 publications
0
1
0
Order By: Relevance
“…Fault tolerant techniques for message passing interface (MPI) applications have been proposed to detect and recover failures [7]- [11]. Failure detection using machine learning frameworks in high performance computing systems that automatically detect and diagnose failures [12]- [14]. A state-of-the-art failure detection, prediction, and recovery techniques in exascale systems has introduced [15].…”
Section: Introductionmentioning
confidence: 99%
“…Fault tolerant techniques for message passing interface (MPI) applications have been proposed to detect and recover failures [7]- [11]. Failure detection using machine learning frameworks in high performance computing systems that automatically detect and diagnose failures [12]- [14]. A state-of-the-art failure detection, prediction, and recovery techniques in exascale systems has introduced [15].…”
Section: Introductionmentioning
confidence: 99%