A 1 PB/s file system to checkpoint three million MPI tasks

Rajachandrasekar, Raghunath; Moody, Adam; Mohror, Kathryn; Panda, Dhabaleswar K.

doi:10.1145/2493123.2462908

Cited by 34 publications

(23 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The technique proposed by Frings et al [28] uses prefetching to increase the performance of loading parallel applications with dynamically linked libraries. Rajachandrasekar et al [86] propose a user-level file system to keep checkpointing requests in the main memory and transparently flush then to persistent storage. Their approach includes support to remote direct memory access (RDMA).…”

Section: Caching and Prefetchingmentioning

confidence: 99%

“…Burst buffers data access costs are modeled with device throughput and latency. Rajachandrasekar et al [86] use a model to estimate throughput of their user-space file system CRUISE, which stores data in main memory and transparently flushes to other persistent storage. Their model considers spill-out to SSDs and considers parameters such as the amount of data and throughput of main memory and SSD.…”

Section: Performance Modelingmentioning

confidence: 99%

See 1 more Smart Citation

A Checkpoint of Research on Parallel I/O for High-Performance Computing

et al. 2018

View full text Add to dashboard Cite

We present a comprehensive survey on parallel I/O in the high performance computing (HPC) context. This is an important field for HPC because of the historic gap between processing power and storage latencies, which causes applications performance to be impaired when accessing or generating large amounts of data. As the available processing power and amount of data increase, I/O remains a central issue for the scientific community. In this survey, we focus on a traditional I/O stack, with a POSIX parallel file system. We present background concepts everyone could benefit from. Moreover, through the comprehensive study of publications from the most important conferences and journals in a five-year time window, we discuss the state of the art of I/O optimization approaches, access pattern extraction techniques, and performance modeling, in addition to general aspects of parallel I/O research. Through this approach, we aim at identifying the general characteristics of the field and the main current and future research topics.

show abstract

Section: Caching and Prefetchingmentioning

confidence: 99%

Section: Performance Modelingmentioning

confidence: 99%

A Checkpoint of Research on Parallel I/O for High-Performance Computing

et al. 2018

View full text Add to dashboard Cite

show abstract

“…al. proposed CRUISE -in memory file system that speeds up checkpointing [18]. In this system, each write request data is initially stored in a pre-allocated persistent memory region, and after flushed to a PFS or a local file system asynchronously.…”

Section: Related Workmentioning

confidence: 99%

A Parallel MPI I/O Solution Supported by Byte-addressable Non-volatile RAM Distributed Cache

Malinowski

Czarnul

Dorozynski³

et al. 2016

Annals of Computer Science and Information Systems

View full text Add to dashboard Cite

Abstract-While many scientific, large-scale applications are data-intensive, fast and efficient I/O operations have become of key importance for HPC environments. We propose an MPI I/O extension based on in-system distributed cache with data located in Non-volatile Random Access Memory (NVRAM) available in each cluster node. The presented architecture makes effective use of NVRAM properties such as persistence and byte-level access behind the MPI I/O API. Another advantage of the proposed solution is making development of a parallel application easy and efficient as a programmer just needs to use the well known MPI I/O data model and API while efficient file access is automatically provided without a need for application level optimizations like avoiding frequent operations on a small data.Results of experiments obtained with three different applications suggest, that the extension significantly reduces file access time, especially for small I/O operations. By locating cache facilities on computing nodes, the extension decreases load of file system servers and makes I/O scalable.

show abstract

“…One consists of featuring each computing node with local storage capability, ensuring through the hardware that this storage will remain available during a failure of the node. Another approach consists of using the memory of the other processors to store the checkpoint, pairing nodes as "buddies," thus taking advantage of the high bandwidth capability of the high speed network to design a scalable checkpoint storage mechanism [31,32,33,34].…”

Section: Weak Scalabilitymentioning

confidence: 99%

Composing resilience techniques: ABFT, periodic and incremental checkpointing

Bosilca

Bouteiller

Hérault

et al. 2015

IJNC

View full text Add to dashboard Cite

Algorithm Based Fault Tolerant (ABFT) approaches promise unparalleled scalability and performance in failure-prone environments. Thanks to recent advances in the understanding of the involved mechanisms, a growing number of important algorithms (including all widely used factorizations) have been proven ABFT-capable. In the context of larger applications, these algorithms provide a temporal section of the execution, where the data is protected by its own intrinsic properties, and can therefore be algorithmically recomputed without the need of checkpoints. However, while typical scientific applications spend a significant fraction of their execution time in library calls that can be ABFT-protected, they interleave sections that are difficult or even impossible to protect with ABFT. As a consequence, the only practical fault-tolerance approach for these applications is checkpoint/restart. In this paper we propose a model to investigate the efficiency of a composite protocol, that alternates between ABFT and checkpoint/restart for the effective protection of an iterative application composed of ABFTaware and ABFT-unaware sections. We also consider an incremental checkpointing composite approach in which the algorithmic knowledge is leveraged by a novel optimal dynamic programming to compute checkpoint dates. We validate these models using a simulator. The model and simulator show that the composite approach drastically increases the performance delivered by an execution platform, especially at scale, by providing the means to increase the interval between checkpoints while simultaneously decreasing the volume of each checkpoint.

show abstract

A 1 PB/s file system to checkpoint three million MPI tasks

Cited by 34 publications

References 13 publications

A Checkpoint of Research on Parallel I/O for High-Performance Computing

A Checkpoint of Research on Parallel I/O for High-Performance Computing

A Parallel MPI I/O Solution Supported by Byte-addressable Non-volatile RAM Distributed Cache

Composing resilience techniques: ABFT, periodic and incremental checkpointing

Contact Info

Product

Resources

About