Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing 2013
DOI: 10.1145/2493123.2462908
|View full text |Cite
|
Sign up to set email alerts
|

A 1 PB/s file system to checkpoint three million MPI tasks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(23 citation statements)
references
References 13 publications
0
23
0
Order By: Relevance
“…The technique proposed by Frings et al [28] uses prefetching to increase the performance of loading parallel applications with dynamically linked libraries. Rajachandrasekar et al [86] propose a user-level file system to keep checkpointing requests in the main memory and transparently flush then to persistent storage. Their approach includes support to remote direct memory access (RDMA).…”
Section: Caching and Prefetchingmentioning
confidence: 99%
See 1 more Smart Citation
“…The technique proposed by Frings et al [28] uses prefetching to increase the performance of loading parallel applications with dynamically linked libraries. Rajachandrasekar et al [86] propose a user-level file system to keep checkpointing requests in the main memory and transparently flush then to persistent storage. Their approach includes support to remote direct memory access (RDMA).…”
Section: Caching and Prefetchingmentioning
confidence: 99%
“…Burst buffers data access costs are modeled with device throughput and latency. Rajachandrasekar et al [86] use a model to estimate throughput of their user-space file system CRUISE, which stores data in main memory and transparently flushes to other persistent storage. Their model considers spill-out to SSDs and considers parameters such as the amount of data and throughput of main memory and SSD.…”
Section: Performance Modelingmentioning
confidence: 99%
“…al. proposed CRUISE -in memory file system that speeds up checkpointing [18]. In this system, each write request data is initially stored in a pre-allocated persistent memory region, and after flushed to a PFS or a local file system asynchronously.…”
Section: Related Workmentioning
confidence: 99%
“…One consists of featuring each computing node with local storage capability, ensuring through the hardware that this storage will remain available during a failure of the node. Another approach consists of using the memory of the other processors to store the checkpoint, pairing nodes as "buddies," thus taking advantage of the high bandwidth capability of the high speed network to design a scalable checkpoint storage mechanism [31,32,33,34].…”
Section: Weak Scalabilitymentioning
confidence: 99%