Abstract-Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic, contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.
Lustre was initiated and funded, almost a decade ago, by the U.S. Department of Energy Office of Science and National Nuclear Security Administration laboratories to address the need for an open source, highly scalable, high-performance parallel filesystem on then-present and future supercomputing platforms. Throughout the last decade, it was deployed over numerous medium-to large-scale supercomputing platforms and clusters, and it performed and met the expectations of the Lustre user community. At the time of this writing, according to the Top500 list, 15 of the top 30 supercomputers in the world use Lustre filesystem.This report aims to present a streamlined overview of how Lustre works internally at reasonable detail including relevant data structures, APIs, protocols, and algorithms involved for the Lustre version 1.6 source code base. More important, the report attempts to explain how various components interconnect and function as a system. Portions of the report are based on discussions with Oak Ridge National Laboratory Lustre Center of Excellence team members, and portions of it are based on the authors' understanding of how the code works. We, the authors, bear all responsibility for errors and omissions in this document. We can only hope the report helps current and future Lustre users and Lustre code developers as much as it helped us understanding the Lustre source code and its internal workings.
The growth of computing power on large-scale systems requires commensurate high-bandwidth I/O systems. Many parallel file systems are designed to provide fast sustainable I/O in response to applications' soaring requirements. To meet this need, a novel system is imperative to temporarily buffer the bursty I/O and gradually flush datasets to long-term parallel file systems. In this paper, we introduce the design of BurstMem, a high-performance burst buffer system. BurstMem provides a storage framework with efficient storage and communication management strategies. Our experiments demonstrate that BurstMem is able to speed up the I/O performance of scientific applications by up to 8.5× on leadership computer systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.