Kui Gao scite author profile

Kui Gao

5Publications

54Citation Statements Received

50Citation Statements Given

How they've been cited

100

How they cite others

Affiliations

Northwestern University, Computer Science Department

Publications

Order By: Most citations

A case study for scientific I/O: improving the FLASH astrophysics code

et al. 2012

View full text Add to dashboard Cite

Abstract. The FLASH code is a computational science tool for simulating and studying thermonuclear reactions. The program periodically outputs large checkpoint files (to resume a calculation from a particular point in time) and smaller plot files (for visualization and analysis). Initial experiments on BlueGene/P spent excessive time in input/output (I/O), making it difficult to do actual science. Our investigation of time spent in I/O revealed several locations in the I/O software stack where we could make improvements. Fixing data corruption in the MPI-IO library allowed us to use collective I/O, yielding an order of magnitude improvement. Restructuring the data layout provided a more efficient I/O access pattern and yielded another doubling of performance, but broke format assumptions made by other tools in the application workflow. Using new nonblocking APIs in the ParallelNetCDF library allowed us to keep high performance and maintain backward compatibility. The I/O research community has studied a host of optimizations and strategies. Sometimes the challenge for applications is knowing how to apply these new techniques to production codes. In this case study, we offer a demonstration of how computational scientists, with a detailed understanding of their application, and the I/O community, with a wide array of approaches from which to choose, can magnify each other's efforts and achieve tremendous application productivity gains.

show abstract

Scalable I/O and analytics

Choudhary

Liao

Gao

et al. 2009

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

Using Subfiling to Improve Programming Flexibility and Performance of Parallel Shared-file I/O

Gao

Liao

Nisar

et al. 2009

View full text Add to dashboard Cite

There are two popular parallel I/O programming styles used by modern scientific computational applications: unique-file and shared-file. Unique-file I/O usually gives satisfactory performance, but its major drawback is that managing a large number of files can overwhelm the task of postsimulation data processing. Shared-file I/O produces fewer files and allows arrays partitioned among processes to be saved in the canonical order. As the number of processors on modern parallel machines increases into thousands and more, the problem size and in turn the global array size also increase proportionally. It is not practical to manage files of size each larger than a few hundreds of GB. Hence, to seek a middle ground between these two I/O styles, we propose a subfiling scheme that divides a large multi-dimensional global array into smaller subarrays, each saved in a smaller file, named subfile. Subfiling is implemented on top of MPI-IO. We also incorporate it into the parallel netCDF library in order to preserve the partitioning information in the netCDF file header, so that the global array can later be reconstructed. In addition, since the subfiling scheme decreases the number of processes sharing a file, it can reduce the overhead of file system's data consistency control. Our experimental results with several I/O benchmarks show that subfiling can provide improved I/O performance.

show abstract

Combining I/O operations for multiple array variables in parallel netCDF

Gao

Liao

Choudhary

et al. 2009

View full text Add to dashboard Cite

Supporting computational data model representation with high-performance I/O in parallel netCDF

Gao

Chen

Choudhary

et al. 2011

View full text Add to dashboard Cite

Abstract-Parallel computational scientific applications have been described by their computation and communication patterns. From a storage and I/O perspective, these applications can also be grouped into separate data models based on the way data is organized and accessed during simulation, analysis, and visualization. Parallel netCDF is a popular library used in many scientific applications to store scientific datasets and provides high-performance parallel I/O. Although the metadata-rich netCDF file format can effectively store and describe regular multi-dimensional array datasets, it does not address the full range of current and future computational science data models. In this paper, we present a new storage scheme in Parallel netCDF to represent a broad variety of data models used in modern computational scientific applications. This scheme also allows concurrent metadata construction for different data objects from multiple groups of application processes, an important feature in obtaining a high degree of I/O parallelism for data models exhibiting irregular data distribution. Furthermore, we employ non-blocking I/O functions to aggregate irregularly distributed data requests into large, contiguous data requests, to achieve high-performance I/O. Using an example of adaptive mesh refinement data model, we demonstrate the proposed scheme can produce scalable performance results for both data and metadata creation and access.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kui Gao

A case study for scientific I/O: improving the FLASH astrophysics code

Scalable I/O and analytics

Using Subfiling to Improve Programming Flexibility and Performance of Parallel Shared-file I/O

Combining I/O operations for multiple array variables in parallel netCDF

Supporting computational data model representation with high-performance I/O in parallel netCDF

Contact Info

Product

Resources

About