Querying Large Scientific Data Sets with Adaptable IO System ADIOS

Gu, Jianhua; Klasky, Scott; Podhorszki, Norbert; Qiang, Ji; Wu, Kesheng

doi:10.1007/978-3-319-69953-0_4

Cited by 16 publications

(7 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Second, the decoupled approach alleviates the metadata bottleneck of metadata servers of the underlying PFS since query operations can be handled by the external query manager. Finally, as previous studies have demonstrated, scientific data are rarely modified once they are generated, and data consistency between external KV pairs and data files can be easily maintained.…”

Section: Design and Implementationmentioning

confidence: 99%

“…It may complicate the consistency issues by managing UDM and indexes as external KV pairs. However, as previous studies have demonstrated, scientific data are rarely modified once they are generated, we apply a relaxed consistency model in our current implementation. UniIndex provides encapsulated APIs (eg, setAttribute and buildIndex) and command line utilities to update external metadata objects.…”

Section: Design and Implementationmentioning

confidence: 99%

“…FastQuery applies bitmap indexing techniques and provides parallel query frameworks to accelerate data selection on HDF5 and NetCDF data formats. Previous works by Wu et al designed a query interface for ADIOS library to allow arbitrary combinations of range conditions on known variables. However, these works store the indexes in a separate index file, which may lead to potential write contention and extra index load costs when processing query operations .…”

Section: Related Workmentioning

confidence: 99%

“…FastQuery 15 applies bitmap indexing techniques and provides parallel query frameworks to accelerate data selection on HDF5 and NetCDF data formats. Previous works by Wu et al 16,17 designed a query interface for ADIOS 19 library to allow arbitrary combinations of range conditions on known variables.…”

Section: Related Workmentioning

confidence: 99%

“…There are also some research efforts that have been made to add index and query interface to high-level I/O libraries. 15,16,17 However, these works write indexes to a separate index file, which may lead to the potential write contention and I/O overhead. 18 Driven by the problems mentioned above, we propose two mechanisms to provide efficient data locating services at the granularity of both a file and a record.…”

mentioning

confidence: 99%

See 4 more Smart Citations

UniIndex: An index and query middleware for parallel file systems

Cheng

Wang

et al. 2019

Concurrency and Computation

View full text Add to dashboard Cite

Summary As data analysis scenarios keep increasing on high‐performance computing systems, the ability to select a small fraction of data from a large volume of scientific data sets is vital to accelerate scientific discovery. However, parallel file systems lack the ability to provide efficient data locating services at the granularity of both a file and a record. Existing methods for identifying and indexing data are often domain‐specific and do not scale to large scientific data sets. In this paper, we describe the design and implementation of UniIndex framework, which combines our proposed techniques for user‐annotation extraction, in‐memory cache layer, in‐situ indexing, and parallel query processing. Acting as middleware on top of production file systems, UniIndex enables efficient data locating services with minimal user effort. Our evaluations show that UniIndex can locate target files from directories containing millions of files in microseconds. By applying in situ indexing and the lightweight range‐bitmap index, record‐level index building time can be dramatically reduced while maintaining up to two orders of magnitude query speedup than scanning the entire data set.

show abstract

Section: Design and Implementationmentioning

confidence: 99%

Section: Design and Implementationmentioning

confidence: 99%