Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2020
DOI: 10.1145/3332466.3374515
|View full text |Cite
|
Sign up to set email alerts
|

Detecting and reproducing error-code propagation bugs in MPI implementations

Abstract: We present an approach to automatically detect and reproduce error code propagation bugs in MPI implementations. Specifically, we combine static analysis and program repair for bug detection, and apply fault injection to reproduce error propagation bugs found in MPI libraries written in C. We demonstrate our approach on the MPICH library, one of the most popular implementations of MPI, and the MPICHbased implementation MVAPICH, uncovering 447 previously unknown bugs. We discovered that 31 of these bugs result … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(1 citation statement)
references
References 28 publications
(25 reference statements)
0
1
0
Order By: Relevance
“…Multi-node communication in large-scale parallel computing applications is typically facilitated through the Message-Passing Interface (MPI) [17]. The MPI standard, established by the MPI Forum, provides a foundation for message-passing libraries [16].…”
Section: Introductionmentioning
confidence: 99%
“…Multi-node communication in large-scale parallel computing applications is typically facilitated through the Message-Passing Interface (MPI) [17]. The MPI standard, established by the MPI Forum, provides a foundation for message-passing libraries [16].…”
Section: Introductionmentioning
confidence: 99%