2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis 2010
DOI: 10.1109/sc.2010.27
|View full text |Cite
|
Sign up to set email alerts
|

FlowChecker: Detecting Bugs in MPI Libraries via Message Flow Checking

Abstract: Abstract-Many MPI libraries have suffered from software bugs, which severely impact the productivity of a large number of users. This paper presents a new method called FlowChecker for detecting communication-related bugs in MPI libraries. The main idea is to extract program intentions of message passing (MPintentions), and to check whether these MP-intentions are fulfilled correctly by the underlying MPI libraries, i.e., whether messages are delivered correctly from specified sources to specified destinations… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 22 publications
(10 citation statements)
references
References 49 publications
0
10
0
Order By: Relevance
“…Instead of instrumenting each memory access in the MPI library, Profiler tracks data movement operations such as memory copy and network send/receive. This will not affect the detection capability of SyncChecker because the underlying MPI libraries often exploit such coarsegrained operations for transferring messages, i.e., copying out message to an intermediate memory location or directly sending message over the network [24], [62].…”
Section: B Profiler: Collecting Runtime Informationmentioning
confidence: 99%
See 1 more Smart Citation
“…Instead of instrumenting each memory access in the MPI library, Profiler tracks data movement operations such as memory copy and network send/receive. This will not affect the detection capability of SyncChecker because the underlying MPI libraries often exploit such coarsegrained operations for transferring messages, i.e., copying out message to an intermediate memory location or directly sending message over the network [24], [62].…”
Section: B Profiler: Collecting Runtime Informationmentioning
confidence: 99%
“…If no intersection is found, Analyzer simply discards the events of data movements since they are irrelevant to nonblocking communication. Similar technique has been used in our prior work [24], [62]. Otherwise, Analyzer performs the state transition for the identified message buffer based on the error detection state machine in Figure 3.…”
Section: Memory Access Instructions and Memory Management Routinesmentioning
confidence: 99%
“…As the bug degrades the performance of Allgather but no deadlock is produced, those techniques targeted at temporal progress [5] will not work either. Finally, since there is no break in the message flow of Allgather as all messages are delivered eventually but with a suboptimal algorithm, FlowChecker [12] will not be able to detect this bug. Therefore, Vrisha is a good complement to these existing techniques for detecting subtle scale-dependent bugs in parallel programs.…”
Section: Comparison With Previous Techniquesmentioning
confidence: 99%
“…With respect to bug localization, the requirement is to localize the bug to as small a portion of the code as possible so that the developer can correct the bug. These two motivations have spurred a significant volume of work in the HPC community, with a spurt being observable in the last five years [5,21,23,11,10,17,12]. Unlike prior work, we focus on bugs that manifest as software is scaled up.…”
Section: Introductionmentioning
confidence: 99%
“…Thus, debuggers are typically restricted to techniques that can be executed sequentially on the front-end node in a reasonable time. Recently, there are notable works, which focus on formal and semi-formal verification of MPI concurrency and message flow checking [23] [24]. However, we only focus on the challenges addressed above.…”
Section: Introductionmentioning
confidence: 99%