Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing 2014
DOI: 10.1145/2600212.2600224
|View full text |Cite
|
Sign up to set email alerts
|

Fault tolerance for remote memory access programming models

Abstract: Remote Memory Access (RMA) is an emerging mechanism for programming high-performance computers and datacenters. However, little work exists on resilience schemes for RMA-based applications and systems. In this paper we analyze fault tolerance for RMA and show that it is fundamentally different from resilience mechanisms targeting the message passing (MP) model. We design a model for reasoning about fault tolerance for RMA, addressing both flat and hierarchical hardware. We use this model to construct several h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
43
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 34 publications
(43 citation statements)
references
References 66 publications
(182 reference statements)
0
43
0
Order By: Relevance
“…Recent predictions about mean time between failures (MTBF) of large-scale systems indicate failures every few hours [9]. Fault tolerance can be achieved with various mechanisms.…”
Section: Enabling Incremental Checkpointingmentioning
confidence: 99%
See 4 more Smart Citations
“…Recent predictions about mean time between failures (MTBF) of large-scale systems indicate failures every few hours [9]. Fault tolerance can be achieved with various mechanisms.…”
Section: Enabling Incremental Checkpointingmentioning
confidence: 99%
“…Fault tolerance can be achieved with various mechanisms. In checkpoint/restart [9] all processes synchronize and record their state to memories or disks. Traditional checkpointing schemes record the same amount of data during every checkpoint.…”
Section: Enabling Incremental Checkpointingmentioning
confidence: 99%
See 3 more Smart Citations