2015 IEEE International Conference on Cluster Computing 2015
DOI: 10.1109/cluster.2015.38
|View full text |Cite
|
Sign up to set email alerts
|

TRIO: Burst Buffer Based I/O Orchestration

Abstract: The growing computing power on leadership HPC systems is often accompanied by ever-escalating failure rates. Checkpointing is a common defensive mechanism used by scientific applications for failure recovery. However, directly writing the large and bursty checkpointing dataset to parallel file systems can incur significant I/O contention on storage servers. Such contention in turn degrades bandwidth utilization of storage servers and prolongs the average job I/O time of concurrent applications. Recently burst … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 37 publications
(14 citation statements)
references
References 23 publications
0
14
0
Order By: Relevance
“…CLARISSE does not consider node-local storage resources or full workflow/workload scheduling optimizations, however. TRIO [19] explores how to efficiently move large checkpointing datasets to the PFS by utilizing the burst buffers. Data Elevator [20] and Stacker [21] are similar to NORNS in that they focus on asynchronously moving data across I/O layers to optimize scientific workflows.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…CLARISSE does not consider node-local storage resources or full workflow/workload scheduling optimizations, however. TRIO [19] explores how to efficiently move large checkpointing datasets to the PFS by utilizing the burst buffers. Data Elevator [20] and Stacker [21] are similar to NORNS in that they focus on asynchronously moving data across I/O layers to optimize scientific workflows.…”
Section: Related Workmentioning
confidence: 99%
“…Unfortunately, while computing and network resources can be shared and managed effectively by state-of-the-art job schedulers, storage resources are still mostly considered as black boxes by these infrastruc-978-1-7281-4734-5/19/$31.00 ©2019 IEEE [18]. While there has been increasing interest in HPC to use burst buffers to optimize the I/O path of datadriven workflows through autonomous, asynchronous data staging [19] [20] [21], these research efforts have not considered I/O as a first class entity in resource scheduling decisions. Thus, we argue that the integration of application I/O needs with scheduling and resource managers is critical to effectively use and manage a hierarchical storage stack that can include as many layers as NVRAM, node-local burst buffers, shared burst buffers, parallel file system, campaign storage, and archival storage.…”
Section: Introductionmentioning
confidence: 99%
“…BurstMem [12] extends and modifies Memcached to work on burst buffers and provide a mechanism of coordinated data shuffling and flushing to the PFS, while IBIO [13] explores how burst buffers can be used to improve resiliency by testing a wide range of checkpoint/restart strategies. Conversely, TRIO [14] proposes an orchestration framework to efficiently move large checkpointing datasets to the PFS with efficiently utilized storage bandwidth and reduced job I/O time.…”
Section: Related Workmentioning
confidence: 99%
“…Many works in this area() have been dedicated to mitigate the increasingly serious problem of cross‐application I/O interference . An I/O orchestration mechanism named TRIO is proposed to coordinate bursty writes of checkpoint data on I/O nodes for better sequential write traffic to storage nodes. TRIO alleviate the contention on storage nodes by controlling the write orders among multiple I/O nodes.…”
Section: Related Workmentioning
confidence: 99%