2009
DOI: 10.1007/978-3-642-03770-2_19
|View full text |Cite
|
Sign up to set email alerts
|

VolpexMPI: An MPI Library for Execution of Parallel Applications on Volatile Nodes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
36
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
4
3
2

Relationship

5
4

Authors

Journals

citations
Cited by 39 publications
(36 citation statements)
references
References 10 publications
0
36
0
Order By: Relevance
“…In contrast, this work (i) considers dependent tasks such as found in applications consisting of linear workflows; and (ii) proposes an optimal dynamic programming algorithm to solve the selective replication and checkpointing problem. Combining replication with checkpointing has also been proposed in [29,41,16] for HPC platforms, and in [22,37] for grid computing.…”
Section: Replicationmentioning
confidence: 99%
“…In contrast, this work (i) considers dependent tasks such as found in applications consisting of linear workflows; and (ii) proposes an optimal dynamic programming algorithm to solve the selective replication and checkpointing problem. Combining replication with checkpointing has also been proposed in [29,41,16] for HPC platforms, and in [22,37] for grid computing.…”
Section: Replicationmentioning
confidence: 99%
“…Recent advances include multi-level approaches, or the use of SSD or NVRAM as secondary storage [14]. Combining replication with checkpointing has been proposed in [41,49,25] for HPC platforms, and in [33,46] for grid computing.…”
Section: Replication For Fail-stop Errorsmentioning
confidence: 99%
“…The idea of node duplication via PMPI has been used in the fault tolerance community, particularly r MPI [7], MR-MPI [6] and VolPEX [11]. Here, duplication ensures that if any particular node goes down its duplicate will step in to allow execution to continue without interruption.…”
Section: Related Workmentioning
confidence: 99%