2020
DOI: 10.1016/j.future.2020.06.003
|View full text |Cite
|
Sign up to set email alerts
|

Extending the OpenCHK Model with advanced checkpoint features

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
1

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 12 publications
0
5
0
1
Order By: Relevance
“…For system-level fault tolerance, most approaches rely on rollback recovery. A system-level approach is given in [15], proposing compiler instructions for allowing users to specify checkpoint/restart operations, supporting from basic to advanced mechanisms currently available on dedicated libraries and the using of fault-tolerance-dedicated threads. Another system-level strategy proposes extensions to the Distem emulator [16], enabling it to evaluate fault tolerance and load balancing mechanisms in real HPC Runtimes Charm++, MPICH, and OpenMPI.…”
Section: Current Fault Tolerance Approachesmentioning
confidence: 99%
“…For system-level fault tolerance, most approaches rely on rollback recovery. A system-level approach is given in [15], proposing compiler instructions for allowing users to specify checkpoint/restart operations, supporting from basic to advanced mechanisms currently available on dedicated libraries and the using of fault-tolerance-dedicated threads. Another system-level strategy proposes extensions to the Distem emulator [16], enabling it to evaluate fault tolerance and load balancing mechanisms in real HPC Runtimes Charm++, MPICH, and OpenMPI.…”
Section: Current Fault Tolerance Approachesmentioning
confidence: 99%
“…Similarly to Maroñas et al [57], BT's MPI implementation was expanded to support scalable checkpoint recovery (SCR). That is, at the end of each iteration, the solver inner state is checkpointed.…”
Section: Dcpmm As Checkpoint/restart (C/r) Storagementioning
confidence: 99%
“…x[0],x[5],x [10],x [15] x[0],x[5],x [10],x [15] #pragma oss taskloop inout(x[i]) grainsize(5) for (i = 0; i < 20; i++) {...} #pragma oss taskloop inout(x[i]) grainsize(5) for (i = 0; i < 20; i++) {...}…”
Section: Methodsmentioning
confidence: 99%
“…Some of the most important are Intel TBB [65], OpenMP [84], CUDA [69] or MPI [117]. They can be classified in several different ways: shared or distributed memory, sup- In this thesis, we contribute to two programming models: OmpSs-2 [14] and OpenCHK [15]. OmpSs-2 is an already existing programming model that we enhanced with novel features.…”
Section: Programming Modelsmentioning
confidence: 99%
See 1 more Smart Citation