Pantheon: Exascale File System Search for Scientific Computing

Naps, Joseph L.; Mokbel, Mohamed F.; Du, David Hung-Chang

doi:10.1007/978-3-642-22351-8_29

Cited by 5 publications

(5 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We see several ways that query optimization could assist with file system search. In previous work [5], we displayed that including a basic selectivity based query optimizer can provide significant decreases in query response time versus naïve query evaluation. This is an extremely simplistic implementation, with much room to experiment with both existing database query optimization models, and totally new, storage system specific models.…”

Section: Query Optimizationmentioning

confidence: 98%

“…Instead of placing the database as another layer, sitting on top of the storage system, we instead look to move database components into the storage system itself. We have started this effort with our Pantheon system [5]. This work is based on the observation that, in order for the high performance computing community to fully leverage what databases have to offer, databases must act as more than just metadata servers.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Toward efficient search for ultrascale storage systems

Naps

Mokbel

2011

Proceedings of the First Annual Workshop on High Performance Computing Meets Databases

Self Cite

View full text Add to dashboard Cite

As the rate at which scientific computing generates data continues to increase, we are finding that managing this data, in all facets, is quickly becoming more challenging. In many facilities with large scale storage needs, this massive amount of data is stored in distributed, multi-tiered storage systems. It has become imperative to allow for efficient and effective search within these kinds of environments. For some search problems, specifically file system metadata, traditional relational databases have been used with, initially, good results. As the scale of supercomputing has grown though, we find that it is becoming increasing difficult for databases to scale up with the volume of metadata that they are managing. In this paper, we propose a new direction for database management techniques within the context of high performance computing, specifically, search within ultrascale storage systems. Instead of using databases as a layer sitting above the storage system, we suggest the movement of database components within the storage system itself. By taking this approach, we aim to leverage the decades of research and tuning that have made relational database technology successful. At the same time, this integration gives us the ability to maintain a better view of the storage system for search optimization. Through this effort, we can position these techniques to better scale to the degree that is required by the high performance computing community currently, and in the future.

show abstract

Section: Query Optimizationmentioning

confidence: 98%

Section: Introductionmentioning

confidence: 99%

Toward efficient search for ultrascale storage systems

Naps

Mokbel

2011

Proceedings of the First Annual Workshop on High Performance Computing Meets Databases

Self Cite

View full text Add to dashboard Cite

show abstract

“…While these performed well on their test data, they focused strictly on POSIX metadata. Loris [27] and Pantheon [21] were both indexing systems tested for system metadata only. Pantheon used B-trees, which are row-based, and will face challenges with sparse data.…”

Section: File System Indexingmentioning

confidence: 99%

“…While there are many workload studies for file system metadata [14,9,8,20,13,28], they have focused on POSIX metadata. Search systems based on them [19,17,27,21] attempt to extrapolate performance for other use cases. By contrast, we examine scientific metadata directly, in order to better understand the design space of scientific metadata and content indexing systems.…”

Section: Introductionmentioning

confidence: 99%

Examining extended and scientific metadata for scalable index designs

Parker-Wood

Long

Madden

et al. 2013

Proceedings of the 6th International Systems and Storage Conference on - SYSTOR '13

View full text Add to dashboard Cite

While file system metadata is well characterized by a variety of workload studies, scientific metadata is much less well understood. We characterize scientific metadata, in order to better understand the implications for index design. Based on our findings, existing solutions for either file system or scientific search will not suffice for indexing a large scientific file system.We describe the problems with existing solutions, and suggest column stores as an alternative approach.

show abstract

“…SciDB requires the scientific data to be loaded into the database and then use query languages, called Array Query Language (AQL) and Array Functional Language (AFL), to access data. Instead of developing a new database system, there are also efforts to modify file systems with high-level semantics [5], [15], [19]. While these efforts are likely to be accepted by a few scientific communities, we believe that the array data model needs to be supported as a first class citizen instead of being supported through layers of metadata.…”

mentioning

confidence: 99%

Expediting scientific data analysis with reorganization of data

Dong

Byna

2013

2013 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

Data producers typically optimize the layout of data files to minimize the write time. In most cases, data analysis tasks read these files in access patterns different from the write patterns causing poor read performance. In this paper, we introduce Scientific Data Services (SDS), a framework for bridging the performance gap between writing and reading scientific data. SDS reorganizes data to match the read patterns of analysis tasks and enables transparent data reads from the reorganized data. We implemented a HDF5 Virtual Object Layer (VOL) plugin to redirect the HDF5 dataset read calls to the reorganized data. To demonstrate the effectiveness of SDS, we applied two parallel data organization techniques: a sort-based organization on a plasma physics data and a transpose-based organization on mass spectrometry imaging data. We also extended the HDF5 data access API to allow selection of data based on their values through a query interface, called SDS Query. We evaluated the execution time in accessing various subsets of data through existing HDF5 Read API and SDS Query. We showed that reading the reorganized data using SDS is up to 55X faster than reading the original data.

show abstract

Pantheon: Exascale File System Search for Scientific Computing

Cited by 5 publications

References 9 publications

Toward efficient search for ultrascale storage systems

Toward efficient search for ultrascale storage systems

Examining extended and scientific metadata for scalable index designs

Expediting scientific data analysis with reorganization of data

Contact Info

Product

Resources

About