A Checkpoint of Research on Parallel I/O for High-Performance Computing

Boito, Francieli Zanon; Inacio, Eduardo C.; Bez, Jean Luca; Navaux, Philippe O. A.; Dantas, Mário A. R.; Denneulin, Yves

doi:10.1145/3152891

Cited by 36 publications

(19 citation statements)

References 122 publications

(119 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…To confirm that it was not something specific to the application, we conducted experiments with the BT-IO benchmark from the NPB [16], the second most used benchmark in the parallel I/O research field, as pointed by [17]. We used the D class, which generates a file of approximately 132.6 GB and yields an execution time in order of minutes.…”

Section: A Experiments With the Bt-io Benchmarkmentioning

confidence: 99%

Collective I/O Performance on the Santos Dumont Supercomputer

Carneiro

Bez

Boito

et al. 2018

2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

Self Cite

View full text Add to dashboard Cite

The historical gap between processing and data access speeds causes many applications to spend a large portion of their execution on I/O operations. From the point of view of a large-scale, expensive, supercomputer, it is important to ensure applications achieve the best I/O performance to promote an efficient usage of the machine. In this paper, we evaluate the I/O infrastructure of the Santos Dumont supercomputer, the largest one from Latin America. More specifically, we investigate the performance of collective I/O operations. By conducting an analysis of a scientific application that uses the machine, we identify large performance differences between the available MPI implementations. We then further study the observed phenomenon using the BT-IO and IOR benchmarks, in addition to a custom microbenchmark. We conclude that the customized MPI implementation by Bull (used by more than 20% of the jobs) presents the worst performance for small collective write operations. Our results are being used to help the Santos Dumont users to achieve the best performance for their applications. Additionally, by investigating the observed phenomenon, we provide information to help improve future MPI-IO collective write implementations.

show abstract

Section: A Experiments With the Bt-io Benchmarkmentioning

confidence: 99%

Collective I/O Performance on the Santos Dumont Supercomputer

Carneiro

Bez

Boito

et al. 2018

2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…It is possible to observe that the applications running in SDumont are mostly performing write operations in greater sizes than the observed by related work. Previous studies (Boito et al 2018; Carns et al 2009) already demonstrated that issuing larger requests results in higher I/O performance.…”

Section: Part I—study Of the I/o Workloadmentioning

confidence: 90%

I/O performance of the Santos Dumont supercomputer

Bez

Carneiro

Pavan

et al. 2019

The International Journal of High Performance Computing Applica

Self Cite

View full text Add to dashboard Cite

In this article, we study the I/O performance of the Santos Dumont supercomputer, since the gap between processing and data access speeds causes many applications to spend a large portion of their execution on I/O operations. For a large-scale expensive supercomputer, it is essential to ensure applications achieve the best I/O performance to promote efficient usage. We monitor a week of the machine’s activity and present a detailed study on the obtained metrics, aiming at providing an understanding of its workload. From experiences with one numerical simulation, we identified large I/O performance differences between the MPI implementations available to users. We investigated the phenomenon and narrowed it down to collective I/O operations with small request sizes. For these, we concluded that the customized MPI implementation by the machine’s vendor (used by more than 20% of the jobs) presents the worst performance. By investigating the issue, we provide information to help improve future MPI-IO collective write implementations and practical guidelines to help users and steer future system upgrades. Finally, we discuss the challenge of describing applications I/O behavior without depending on information from users. That allows for identifying the application’s I/O bottlenecks and proposing ways of improving its I/O performance. We propose a methodology to do so, and use GROMACS, the application with the largest number of jobs in 2017, as a case study.

show abstract

“…例如, 隔离故障节点, 将原来在故障节点的任务迁移到备份节点, 然后将备份节点加入系统, 恢复系统的正常功能和性能. 检查点技术是维持程序长时间运行的另一种技术 [94] . 系统周期性地在检查点保存程序执行的现场…”

Section: 热量使芯片结温保持在很低的水平有效提升了系统长时间运行的可靠性unclassified

Key issues in exascale computing

Qian¹,

Wang²

2020

Sci. Sin.-Inf.

View full text Add to dashboard Cite

Over the past several decades, high performance computing (HPC) in China has undergone tremendous growth under the continuous support of national research programs. The development of exascale computers is the current goal set by the National Key R&D Project on HPC. Starting with a brief historical review of China's HPC development, this article analyzes the major challenges encountered in developing exascale computers. Thereafter, some important issues in realizing exascale computing are discussed, including architecture, processor, interconnect, parallel system software, parallel programming, algorithm, and resilience.

show abstract

A Checkpoint of Research on Parallel I/O for High-Performance Computing

Cited by 36 publications

References 122 publications

Collective I/O Performance on the Santos Dumont Supercomputer

Collective I/O Performance on the Santos Dumont Supercomputer

I/O performance of the Santos Dumont supercomputer

Key issues in exascale computing

Contact Info

Product

Resources

About