Easing the burdens of HPC file management

Jones, Stephanie; Strong, Christina R.; Parker-Wood, Aleatha; Holloway, Alexandra; Long, Darrell D. E.

doi:10.1145/2159352.2159359

Cited by 11 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One such application, file-per-process (N-N) checkpointing, requires the metadata service to handle a huge number of file creates all at the beginning of the checkpoint [9]. Another example, storage management, produces a read-intensive metadata workload that typically scans the metadata of the entire file system to perform administrative tasks [28], [30]. Finally, even in the era of big data, most files in even the largest cluster file systems are small [19], [61], where median file size is often only hundreds of kilobytes.…”

Section: Introductionmentioning

confidence: 99%

IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion

Ren

Zheng

Patil

et al. 2014

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis

118

View full text Add to dashboard Cite

Abstract-The growing size of modern storage systems is expected to exceed billions of objects, making metadata scalability critical to overall performance. Many existing distributed file systems only focus on providing highly parallel fast access to file data, and lack a scalable metadata service. In this paper, we introduce a middleware design called IndexFS that adds support to existing file systems such as PVFS, Lustre, and HDFS for scalable high-performance operations on metadata and small files. IndexFS uses a table-based architecture that incrementally partitions the namespace on a per-directory basis, preserving server and disk locality for small directories. An optimized log-structured layout is used to store metadata and small files efficiently. We also propose two client-based stormfree caching techniques: bulk namespace insertion for creation intensive workloads such as N-N checkpointing; and stateless consistent metadata caching for hot spot mitigation. By combining these techniques, we have demonstrated IndexFS scaled to 128 metadata servers. Experiments show our out-of-core metadata throughput out-performing existing solutions such as PVFS, Lustre, and HDFS by 50% to two orders of magnitude.

show abstract

Section: Introductionmentioning

confidence: 99%

IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion

Ren

Zheng

Patil

et al. 2014

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis

118

View full text Add to dashboard Cite

show abstract

“…The most similar work to TrueNames is that of Jones et al [17], who proposed a non-hierarchical HPC file system with automatically generated file names, chosen by examining the distribution of metadata fields. By contrast, our work uses a more robust and less complex scheme which puts the user and application in control of which metadata is used, and allows them to select attributes which are most appropriate for the file's semantic type, rather than relying on statistical techniques.…”

Section: Non-hierarchical and Semantic File Systemsmentioning

confidence: 99%

A File By Any Other Name

Parker-Wood

Long

Miller

et al. 2014

Proceedings of International Conference on Systems and Storage

Self Cite

View full text Add to dashboard Cite

File names are one of the earliest computing abstractions, a string of characters to uniquely identify a file for the system, and to help users remember the contents when they look for it later. They are also a rich source of semantic metadata about files. However, this metadata is unstructured and opaque to the rest of the system. As a result, metadata in file names is often error-prone, and hard to search for. File names can and should be more meaningful and reliable, while simplifying application design and encouraging users and applications to provide more metadata for search.We describe a POSIX compliant prototype file system, TrueNames, which demonstrates an alternate approach to naming and metadata, called metadata aware naming. TrueNames separates the task of uniquely identifying a file from the task of helping the user remember its contents. It captures metadata in a structured format for later indexing, and uses it to generate file names which are correct, regenerable, and disambiguatable by design. TrueNames simplifies application design by providing a consistent interface for metadata aware naming, incurs a low overhead of approximately 15% under realistic workloads, and can simplify a wide variety of data management tasks for both applications and users.

show abstract

“…One such example, checkpointing, requires the metadata service to handle large number of file creates and updates at very high speeds [6]. Another example, storage management, produces readintensive metadata workload that typically scans the metadata of the entire file system to perform administration tasks for analyzing and querying metadata [11], [13].…”

Section: Introductionmentioning

confidence: 99%