Proceedings of the Second International Workshop on Software Engineering for High Performance Computing System Applications 2005
DOI: 10.1145/1145319.1145342
|View full text |Cite
|
Sign up to set email alerts
|

Automated, scalable debugging of MPI programs with Intel® Message Checker

Abstract: The trend towards many-core multi-processor systems and clusters will make systems with tens and hundreds of processors more widely available. Current manual debugging techniques do not scale well to such large systems. Advanced automated debugging tools are needed for standard programming models based on commodity computing, such as threads and MPI. We surveyed MPI users to identify the kinds of MPI errors that they encounter, and classify the errors into several types. We describe how automated tools can det… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
55
0
11

Year Published

2006
2006
2021
2021

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 72 publications
(66 citation statements)
references
References 3 publications
0
55
0
11
Order By: Relevance
“…Dealing with dynamic tools able to check collective operations, we can mention DAMPI [13], Marmot [6,8], Umpire [12,8], MPI-CHECK [7,8], Intel Message Checker (IMC) [2,8] and MUST [5,4]. Umpire, Marmot and MUST rely on a dynamic analysis of MPI calls instrumented through the MPI profiling interface (PMPI).…”
Section: Online Dynamic Toolsmentioning
confidence: 99%
“…Dealing with dynamic tools able to check collective operations, we can mention DAMPI [13], Marmot [6,8], Umpire [12,8], MPI-CHECK [7,8], Intel Message Checker (IMC) [2,8] and MUST [5,4]. Umpire, Marmot and MUST rely on a dynamic analysis of MPI calls instrumented through the MPI profiling interface (PMPI).…”
Section: Online Dynamic Toolsmentioning
confidence: 99%
“…This work directly relates to other runtime error detection approaches for MPI applications, which include Marmot [2], Umpire [3], ISP [11], MPI-Check [12], and Intel's approach [13]. While MUST is the successor of both Marmot and Umpire, the MPI-Check tool and Intel's approach use a timeout-based deadlock detection.…”
Section: Related Workmentioning
confidence: 99%
“…Without proper synchronization, the MPI application and the MPI library may simultaneously access the message buffer and therefore could corrupt message data or receive undefined data, leading to severe program failures such as crashes, hangs, or incorrect results. According to a recent survey on the importance and severity of MPI errors [12], programmers have ranked such synchronization errors as No. 6 out of 21 different error types.…”
Section: A Motivationmentioning
confidence: 99%
“…Realizing the importance of the reliability of parallel and distributed programs, researchers have proposed many dynamic techniques for interactive parallel debugging [40], [41], [42], [43], [44], [45], [46] and automatic bug detection [12], [19], [24], [47], [48], [49], [50]. Interactive parallel debuggers help programmers identify the bugs by exploiting automated information collection, aggregation, and visualization techniques.…”
Section: Bug Detection For Parallel and Distributed Programsmentioning
confidence: 99%