2017
DOI: 10.1002/cpe.4161
|View full text |Cite
|
Sign up to set email alerts
|

A flexible I/O arbitration framework for netCDF‐based big data processing workflows on high‐end supercomputers

Abstract: Summary On the verge of the convergence between high‐performance computing and Big Data processing, it has become increasingly prevalent to deploy large‐scale data analytics workloads on high‐end supercomputers. Such applications often come in the form of complex workflows with various different components, assimilating data from scientific simulations as well as from measurements streamed from sensor networks, such as radars and satellites. For example, as part of the Flagship 2020 (post‐K) supercomputer proj… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 34 publications
0
5
0
Order By: Relevance
“…More generally, both hardware‐ and software‐based strategies for speeding up inter‐executable communication on distributed memory computers with little or no impact on the communicating programs are emerging within the community (e.g. (Liao et al )).…”
Section: Discussionmentioning
confidence: 99%
“…More generally, both hardware‐ and software‐based strategies for speeding up inter‐executable communication on distributed memory computers with little or no impact on the communicating programs are emerging within the community (e.g. (Liao et al )).…”
Section: Discussionmentioning
confidence: 99%
“…Based on this observation, sequential prefetching was proposed to optimize I/O performance, and has long been used in operating systems [15]. UNIX-based operating systems [16], database servers [17,18], distributed file systems [19], big data processing [20], cloud storage systems [21], mobile devices [22], and high-end storage controllers [23] also employ sequential prefetching. Linux has used sequential prefetching since 2002 [24][25][26].…”
Section: Linux Readahead Schemementioning
confidence: 99%
“…Most of these mechanisms can also be used in processing environmental data, as both fields are expected to access the data anywhere and anytime efficiently and securely. Liao et al 25 The majority of contemporary storage formats for multidimensional arrays, such as HDF5, NetCDF, TIFF, and Zarr, provide the ability to manage the storage and compression of data; however, these options can yield notable performance consequences. The solution in Reference 34 explored different strategies for chunking the data into units of different sizes across the dataset's temporal and spatial dimensions.…”
Section: Related Workmentioning
confidence: 99%
“…Most of these mechanisms can also be used in processing environmental data, as both fields are expected to access the data anywhere and anytime efficiently and securely. Liao et al 25 presented a direct communication framework designed for complex workflows that eliminate unnecessary file I/O among components in large‐scale HPC systems. Specifically, they propose an I/O arbitration layer that provides direct parallel data transfer (both synchronous and asynchronous) among job components that rely on the netCDF interface for performing I/O operations.…”
Section: Related Workmentioning
confidence: 99%