Software-defined QoS for I/O in exascale computing

Hua, Yusheng; Shi, Xuanhua; Jin, Hai; Liu, Wei; Jiang, Yao; Chen, Yong; He, Ligang

doi:10.1007/s42514-019-00005-9

Cited by 8 publications

(4 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…At each time-step, the progress of each application is monitored as the number of I/O transfers that have been granted so far. In a related paper [11], the authors survey I/O capabilities of state-of-the-art supercomputers and enforce QoS constraints for I/O transfers by implementing a token-based bucket algorithm that works similarly to that of [24]. Finally, the authors of [23] target a system with several I/O sub-systems (OST, which stands for Object Storage Target, typically a RAID array of disks).…”

Section: I/o-copmentioning

confidence: 99%

Revisiting I/O bandwidth-sharing strategies for HPC applications

Benoit,

Herault,

Perotin

et al. 2024

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Section: I/o-copmentioning

confidence: 99%

Revisiting I/O bandwidth-sharing strategies for HPC applications

Benoit,

Herault,

Perotin

et al. 2024

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

“…PADLL is able to control the rate of both data and metadata workflows. Other systems are directly implemented within core layers of the HPC I/O stack, including the PFS [14], [18], [20], [22], [23], scheduler [21], and I/O libraries [16], [17]. These solutions are intrusive and offer limited maintainability and portability.…”

Section: Related Workmentioning

confidence: 99%

“…While there are numerous solutions to assess the bottlenecks generated from data workflows in HPC clusters [13], [14], [16]- [23], the metadata counterpart has not received the same level of attention, and existing approaches are suboptimal. * Corresponding author: Ricardo Macedo (ricardo.g.macedo@inesctec.pt).…”

Section: Introductionmentioning

confidence: 99%

Protecting Metadata Servers From Harm Through Application-level I/O Control

Macedo

Miranda

Tanimura³

et al. 2022

2022 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

Modern large-scale I/O applications that run on HPC infrastructures are increasingly becoming metadataintensive. Unfortunately, having multiple concurrent applications submitting massive amounts of metadata operations can easily saturate the shared parallel file system's metadata resources, leading to unresponsiveness of the storage backend and overall performance degradation. To address these challenges, we present PADLL, a storage middleware that enables system administrators to proactively control and ensure QoS over metadata workflows in HPC storage systems. We demonstrate its performance and feasibility by controlling the rate of both synthetic and realistic I/O workloads. Results show that PADLL can dynamically control metadata-aggressive workloads, prevent I/O burstiness, and ensure I/O fairness and prioritization.

show abstract

“…System overheadQoS guarantee has been extensively studied in storage systems and implemented in different ways. Hua et al22 proposed a software-defined QoS framework that using token-bucket mechanisms for bandwidth control and for guaranteeing applications' I/O requirements. The key idea of this work is to integrate software-defined components into storage systems and provide a fine-grained QoS.…”

mentioning

confidence: 99%

DDL‐QoS: A dynamic I/O scheduling strategy of QoS for HPC applications

Yang

Shi

Liu

et al. 2019

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

With the increasing cloud-trend of high-performance computing (HPC), more users submit their applications simultaneously to the platform and wish they could finish before the deadline. Moreover, due to the severe holistic performance degradation caused by I/O contention, a deadline-sensitive I/O scheduler is needed to allocate storage resources according to the requirements of applications and resultantly guarantee the quality of service (QoS) of concurrently running applications. In this paper, we first explore the bandwidth allocation phenomenon caused by interference in applications through the modeling of historical data, and then we quote a metric called random percentage that can represent the random degree of the applications and be used to guide I/O scheduling in the later stage. We design a dynamic I/O scheduler named DDL-QoS that uses solid state drives(SSDs) as QoS guarantee to minimize interference and ensure applications meet their deadline. The potential of our design is that the greater the I/O interference, the greater the performance improvement, but this performance improvement will be limited by the physical properties of the storage hardware. KEYWORDS Deadline, I/O Scheduler, QoS 1 INTRODUCTION Nowadays, high-performance computing (HPC) systems are beginning to enter the exascale era. 1 IBM Summit, the fastest supercomputer in the world used at Oak Ridge National Laboratory is capable of 200 PetaFlops. 2 Sunway TaihuLight, the fastest supercomputer in China, has a peak performance of 120 PetaFlops. 3 The explosive growth of computing power requires the underlying parallel file system provides higher performance. At the same time, more and more data-intensive applications run on a large-scale in high-performance computing systems, resulting in an increasing demand for capable storage systems. Besides, some of the parallel file systems commonly used in HPC systems, such as Lustre, 4 GPFS, 5 OrangeFS 6 , etc, are starting to face significant challenges in terms of performance, complexity, and so on. 7 At the same time, HPC is migrating to the cloud as more and more HPC users begin to look to the cloud to help solve their workload challenges, and many public cloud companies have launched HPC products, such as Amazon Web Services, 8 Alibaba Cloud Computing, 9 etc. It means that storage resources in HPC system are shared between more and more different users. With the development of scientific applications, the scale of computing is growing. More and more storage resources are needed. Limited storage resources need to serve more applications. When multiple applications access the storage service concurrently, they will compete for I/O resources, which will lead to a serious drop in I/O aggregate bandwidth. 10 In addition, their I/O requests with different I/O access modes mixed. The hard disk drives (HDDs) can handle the requests with continuous mode more efficiently, while the requests with random mode will cause more seek overhead resulting in overall performance degradation. 11 We call this I/O i...

show abstract

Software-defined QoS for I/O in exascale computing

Cited by 8 publications

References 20 publications

Revisiting I/O bandwidth-sharing strategies for HPC applications

Revisiting I/O bandwidth-sharing strategies for HPC applications

Protecting Metadata Servers From Harm Through Application-level I/O Control

DDL‐QoS: A dynamic I/O scheduling strategy of QoS for HPC applications

Contact Info

Product

Resources

About