Proceedings of the 2016 International Conference on Supercomputing 2016
DOI: 10.1145/2925426.2926264
|View full text |Cite
|
Sign up to set email alerts
|

SReplay

Abstract: Replay of parallel execution is required by HPC debuggers and resilience mechanisms. Up-to-date, there is no existing deterministic replay solution for one-sided communication. The essential problem is that the readers of updated data do not have any information on which remote threads produced the updates, the conventional happens-before based ordering tracking techniques are challenging to work at scale. This paper presents SReplay, the first software tool for sub-group deterministic record and replay for on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 60 publications
(60 reference statements)
0
2
0
Order By: Relevance
“…State-of-the-art record-and-replay tools such as ReMPI (Sato et al, 2015) target production-scale runs and prioritize scalability in terms of runtime and record size. Other record-and-replay tools target hybrid MPI + OpenMP executions (Budanur et al, 2012), MPI applications using one-sided communication (Qian et al, 2016b,a), replay of isolated subgroups of processes (Xue et al, 2009), and probabilistic replay (Park et al, 2009). In addition, tools such as NINJA (Sato et al, 2017) are used in conjunction with record-and-replay tools to improve the chances of capturing nondeterministic bugs.…”
Section: Existing Graph Algorithms In Hpc and Unaddressed Needs In No...mentioning
confidence: 99%
“…State-of-the-art record-and-replay tools such as ReMPI (Sato et al, 2015) target production-scale runs and prioritize scalability in terms of runtime and record size. Other record-and-replay tools target hybrid MPI + OpenMP executions (Budanur et al, 2012), MPI applications using one-sided communication (Qian et al, 2016b,a), replay of isolated subgroups of processes (Xue et al, 2009), and probabilistic replay (Park et al, 2009). In addition, tools such as NINJA (Sato et al, 2017) are used in conjunction with record-and-replay tools to improve the chances of capturing nondeterministic bugs.…”
Section: Existing Graph Algorithms In Hpc and Unaddressed Needs In No...mentioning
confidence: 99%
“…MPI's onesided communication routines in particular pose unique challenges to R&R. Quian et al proposed two techniques for addressing this challenge-OPR [42] and its successor SReplay [43]. SReplay proposes a hybrid-replay scheme which permits replay of subgroups of processes.…”
Section: Debugging-centric Techniquesmentioning
confidence: 99%