TRIO: Burst Buffer Based I/O Orchestration

Wang, Teng; Oral, Sarp; Pritchard, Michael; Wang, Bin; Yu, Weikuan

doi:10.1109/cluster.2015.38

Cited by 37 publications

(14 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…CLARISSE does not consider node-local storage resources or full workflow/workload scheduling optimizations, however. TRIO [19] explores how to efficiently move large checkpointing datasets to the PFS by utilizing the burst buffers. Data Elevator [20] and Stacker [21] are similar to NORNS in that they focus on asynchronously moving data across I/O layers to optimize scientific workflows.…”

Section: Related Workmentioning

confidence: 99%

“…Unfortunately, while computing and network resources can be shared and managed effectively by state-of-the-art job schedulers, storage resources are still mostly considered as black boxes by these infrastruc-978-1-7281-4734-5/19/$31.00 ©2019 IEEE [18]. While there has been increasing interest in HPC to use burst buffers to optimize the I/O path of datadriven workflows through autonomous, asynchronous data staging [19] [20] [21], these research efforts have not considered I/O as a first class entity in resource scheduling decisions. Thus, we argue that the integration of application I/O needs with scheduling and resource managers is critical to effectively use and manage a hierarchical storage stack that can include as many layers as NVRAM, node-local burst buffers, shared burst buffers, parallel file system, campaign storage, and archival storage.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

NORNS: Extending Slurm to Support Data-Driven Workflows through Asynchronous Data Staging

Miranda

Jackson

Tocci

et al. 2019

2019 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

As HPC systems move into the Exascale era, parallel file systems are struggling to keep up with the I/O requirements from data-intensive problems. While the inclusion of burst buffers has helped to alleviate this by improving I/O performance, it has also increased the complexity of the I/O hierarchy by adding additional storage layers each with its own semantics. This forces users to explicitly manage data movement between the different storage layers, which, coupled with the lack of interfaces to communicate data dependencies between jobs in a data-driven workflow, prevents resource schedulers from optimizing these transfers to benefit the cluster's overall performance. This paper proposes several extensions to job schedulers, prototyped using the Slurm scheduling system, to enable users to appropriately express the data dependencies between the different phases in their processing workflows. It also introduces a new service for asynchronous data staging called NORNS that coordinates with the job scheduler to orchestrate data transfers to achieve better resource utilization. Our evaluation shows that a workflow-aware Slurm exploits node-local storage more effectively, reducing the filesystem I/O contention and improving job running times.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

NORNS: Extending Slurm to Support Data-Driven Workflows through Asynchronous Data Staging

Miranda

Jackson

Tocci

et al. 2019

2019 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

show abstract

“…BurstMem [12] extends and modifies Memcached to work on burst buffers and provide a mechanism of coordinated data shuffling and flushing to the PFS, while IBIO [13] explores how burst buffers can be used to improve resiliency by testing a wide range of checkpoint/restart strategies. Conversely, TRIO [14] proposes an orchestration framework to efficiently move large checkpointing datasets to the PFS with efficiently utilized storage bandwidth and reduced job I/O time.…”

Section: Related Workmentioning

confidence: 99%

ECHOFS: A Scheduler-Guided Temporary Filesystem to Leverage Node-Local NVMS

Miranda

Nou

Cortés

2018

2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

View full text Add to dashboard Cite

The growth in data-intensive scientific applications poses strong demands on the HPC storage subsystem, as data needs to be copied from compute nodes to I/O nodes and vice versa for jobs to run. The emerging trend of adding denser, NVM-based burst buffers to compute nodes, however, offers the possibility of using these resources to build temporary filesystems with specific I/O optimizations for a batch job. In this work, we present echofs, a temporary filesystem that coordinates with the job scheduler to preload a job's input files into node-local burst buffers. We present the results measured with NVM emulation, and different FS backends with DAX/FUSE on a local node, to show the benefits of our proposal and such coordination.

show abstract

“…Many works in this area() have been dedicated to mitigate the increasingly serious problem of cross‐application I/O interference . An I/O orchestration mechanism named TRIO is proposed to coordinate bursty writes of checkpoint data on I/O nodes for better sequential write traffic to storage nodes. TRIO alleviate the contention on storage nodes by controlling the write orders among multiple I/O nodes.…”

Section: Related Workmentioning

confidence: 99%

Cross‐layer coordination in the I/O software stack of extreme‐scale systems

Liu

et al. 2017

Concurrency and Computation

View full text Add to dashboard Cite

Summary I/O forwarding layer has now become a standard storage layer in today's HPC systems in order to scale current storage systems to new levels of concurrency. With the deepening of storage hierarchy, I/O requests must traverse through several types of nodes to access required data, including compute nodes, I/O nodes, and storage nodes. It becomes difficult to control the data path and apply cross‐layer I/O optimization. In this paper, we propose a well coordinated I/O stack, which coordinates the data path between compute nodes and I/O nodes for better load balancing and data locality with a job‐level I/O node mapping, and coordinates data path between I/O nodes and storage nodes for lighter I/O interference. We implement and evaluate our ideas on Tianhe‐1A by leveraging an open‐source I/O forwarding layer named IOFSL. The experimental results show that our proposals can significantly accelerate I/O performance of multiple I/O kernels and real applications.

show abstract

TRIO: Burst Buffer Based I/O Orchestration

Cited by 37 publications

References 23 publications

NORNS: Extending Slurm to Support Data-Driven Workflows through Asynchronous Data Staging

NORNS: Extending Slurm to Support Data-Driven Workflows through Asynchronous Data Staging

ECHOFS: A Scheduler-Guided Temporary Filesystem to Leverage Node-Local NVMS

Cross‐layer coordination in the I/O software stack of extreme‐scale systems

Contact Info

Product

Resources

About