Cross‐layer coordination in the I/O software stack of extreme‐scale systems

Yu, Jie; Liu, Guangming; Li, Xiaoyong; Dong, Wenrui; Li, Qiong

doi:10.1002/cpe.4396

Cited by 7 publications

(3 citation statements)

References 33 publications

(48 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…GekkoFS creates a temporary file system on compute nodes using their local storage capacity as a burst-buffer to alleviate I/O peaks. It ranked 4th in the overall 10-node challenge of IO500 3 in November 2019, as well 2nd concerning metadata performance in the same challenge.…”

Section: Gekkofwd: On-demand I/o Forwardingmentioning

confidence: 98%

“…Furthermore, the increasing heterogeneity of the workloads running in HPC installations, from the traditionally compute-bound scientific simulations to Machine Learning applications and I/O bound Big Data workflows, pose new challenges. As systems grow in the number of compute nodes to accommodate larger applications and more concurrent jobs, the shared storage powered by Parallel File Systems (PFS) is not able to keep providing performance due to concurrency and interference [1], [2], [3].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Arbitration Policies for On-Demand User-Level I/O Forwarding on HPC Platforms

Bez

Miranda

Nou

et al. 2021

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

View full text Add to dashboard Cite

I/O forwarding is a well-established and widelyadopted technique in HPC to reduce contention in the access to storage servers and transparently improve I/O performance. Rather than having applications directly accessing the shared parallel file system, the forwarding technique defines a set of I/O nodes responsible for receiving application requests and forwarding them to the file system, thus reshaping the flow of requests. The typical approach is to statically assign I/O nodes to applications depending on the number of compute nodes they use, which is not always necessarily related to their I/O requirements. Thus, this approach leads to inefficient usage of these resources. This paper investigates arbitration policies based on the applications I/O demands, represented by their access patterns. We propose a policy based on the Multiple-Choice Knapsack problem that seeks to maximize global bandwidth by giving more I/O nodes to applications that will benefit the most. Furthermore, we propose a userlevel I/O forwarding solution as an on-demand service capable of applying different allocation policies at runtime for machines where this layer is not present. We demonstrate our approach's applicability through extensive experimentation and show it can transparently improve global I/O bandwidth by up to 85% in a live setup compared to the default static policy.

show abstract

Section: Gekkofwd: On-demand I/o Forwardingmentioning

confidence: 98%

Section: Introductionmentioning

confidence: 99%

Arbitration Policies for On-Demand User-Level I/O Forwarding on HPC Platforms

Bez

Miranda

Nou

et al. 2021

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

View full text Add to dashboard Cite

show abstract

“…While HPC clusters typically rely on a shared storage infrastructure powered by a Parallel File System (PFS) such as Lustre [1], GPFS [2], or Panasas [3], the increasing I/O demands of applications from fundamentally distinct domains stress this shared infrastructure. As systems grow in the number of compute nodes to accommodate larger applications and more concurrent jobs, the PFS is not able to keep providing performance due to increasing contention and interference [4], [5], [6].…”

Section: Introductionmentioning

confidence: 99%

Towards On-Demand I/O Forwarding in HPC Platforms

Bez

Boito

Miranda

et al. 2020

2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)

View full text Add to dashboard Cite

I/O forwarding is an established and widely-adopted technique in HPC to reduce contention and improve I/O performance in the access to shared storage infrastructure. On such machines, this layer is often physically deployed on dedicated nodes, and their connection to the clients is static. Furthermore, the increasingly heterogeneous workloads entering HPC installations stress the I/O stack, requiring tuning and reconfiguration based on the applications' characteristics. Nonetheless, it is not always feasible in a production system to explore the potential benefits of this layer under different configurations without impacting clients. In this paper, we investigate the effects of I/O forwarding on performance by considering the application's I/O access patterns and system characteristics. We aim to explore when forwarding is the best choice for an application, how many I/O nodes it would benefit from, and whether not using forwarding at all might be the correct decision. To gather performance metrics, explore, and understand the impact of forwarding I/O requests of different access patterns, we implemented FORGE, a lightweight I/O forwarding layer in user-space. Using FORGE, we evaluated the optimal forwarding configurations for several access patterns on MareNostrum 4 (Spain) and Santos Dumont (Brazil) supercomputers. Our results demonstrate that shifting the focus from a static system-wide deployment to an on-demand reconfigurable I/O forwarding layer dictated by application demands can improve I/O performance on future machines.

show abstract