Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2015
DOI: 10.1145/2807591.2807642
|View full text |Cite
|
Sign up to set email alerts
|

Clock delta compression for scalable order-replay of non-deterministic parallel applications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
3

Relationship

3
6

Authors

Journals

citations
Cited by 19 publications
(20 citation statements)
references
References 26 publications
0
20
0
Order By: Relevance
“…As depicted in Figure 1, if an application is not deterministic, then external methods can be used to make it deterministic. For example, one can identify and fix races with a race detector such as Archer [4], or directly determinize an execution using a capture-playback framework such as ReMPI [33].…”
Section: Workflow For Multi-level Analysismentioning
confidence: 99%
“…As depicted in Figure 1, if an application is not deterministic, then external methods can be used to make it deterministic. For example, one can identify and fix races with a race detector such as Archer [4], or directly determinize an execution using a capture-playback framework such as ReMPI [33].…”
Section: Workflow For Multi-level Analysismentioning
confidence: 99%
“…MCB is implemented primarily using MPI's non-blocking point-to-point communication primitives, specifically making heavy use of non-blocking matching functions such as MPI_Testsome during particle exchange. This leads to significant nondeterminism which can be directly observed in terms of differing numerical outputs from run to run (Cleveland et al, 2013;Sato et al, 2015). MCB's executions can be decomposed into three distinct communication patterns (i.e.…”
Section: Monte Carlo Benchmarkmentioning
confidence: 99%
“…Tools for addressing some aspects of the nondeterministic problem have emerged, but they do not provide methods for systematically cataloging the nondeterminism in a given application. For example, record-and-replay (R&R) techniques (Chapp et al, 2018;Taufer et al, 2005) present an attractive method for mitigating the harmful aspects of nondeterminism in HPC applications-such as numerical irreproducibility (Chapp et al, 2015) and hampered debugging (Sato et al, 2015(Sato et al, , 2017)-but are restricted by their limited scalability (i.e. they cannot effectively record executions of HPC applications and the peta-and exascale).…”
Section: Introductionmentioning
confidence: 99%
“…While in many MPI programs, nondeterminism can be controlled by eliminating application sources of nondeterminism, such as calls to rand() and/or time(), in other programs this is difficult because of nondeterminism introduced by MPI point-to-point communication patterns. To address these applications, we rely on record-and-replay tools [39], [40], on which a fault-free run is recorded and it is then replayed in all subsequent faulty executions.…”
Section: B Parallel Tracing Overheadmentioning
confidence: 99%