Ad Hoc File Systems for High-Performance Computing

Brinkmann, André; Mohror, Kathryn; Yu, Weikuan; Carns, Philip; Cortés, Toni; Klasky, Scott; Miranda, Alberto; Pfreundt, Franz-Josef; Ross, Robert; Vef, Marc-André

doi:10.1007/s11390-020-9801-1

Cited by 29 publications

(14 citation statements)

References 66 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While emerging storage technologies [39]- [41] that aim to elevate I/O bottlenecks, and projects such as SAGE [42] and DAOS [43], are being intensively studied, I/O will likely remain a challenging aspect when deploying ML workload. By enabling fine-grained profiling and tracing capability, we also enable the opportunity for automated decision making and auto-tuning in the future.…”

Section: Discussionmentioning

confidence: 99%

tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

Chien

Podobas

2020

2020 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training through GPUs. However, another heavy aspect of Machine Learning is I/O, and this can potentially be a performance bottleneck. TensorFlow, one of the most popular Deep-Learning platforms, now offers a new profiler interface and allows instrumentation of TensorFlow operations. However, the current profiler only enables analysis at the TensorFlow platform level and does not provide system-level information. In this paper, we extend TensorFlow Profiler and introduce tf-Darshan, both a profiler and tracer, that performs instrumentation through Darshan. We use the same Darshan shared instrumentation library and implement a runtime attachment without using a system preload. We can extract Darshan profiling data structures during TensorFlow execution to enable analysis through the TensorFlow profiler. We visualize the performance results through TensorBoard, the web-based TensorFlow visualization tool. At the same time, we do not alter Darshan's existing implementation. We illustrate tf-Darshan by performing two case studies on ImageNet image and Malware classification. We show that by guiding optimization using data from tf-Darshan, we increase POSIX I/O bandwidth by up to 19% by selecting data for staging on fast tier storage. We also show that Darshan has the potential of being used as a runtime library for profiling and providing information for future optimization.

show abstract

Section: Discussionmentioning

confidence: 99%

tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

Chien

Podobas

2020

2020 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

show abstract

“…This requires either an application-specific data management implementation, or the addition of a system software layer that manages the data and presents it to the application. One such software layer is GekkoFS [7,8] , an ephemeral file system that is created just-in-time for individual jobs, and that sits on top of the B-APM working across nodes. GekkoFS is ephemeral because it is only live for the duration of a job, and it therefore also needs the ability to move the data into, and out of, B-APM.…”

Section: B-apm As a Distributed File Systemmentioning

confidence: 99%

“…IOR benchmark documentation, https://ior.readthedocs.io8 The IO500 list, https://www.vi4io.org/io500/start…”

mentioning

confidence: 99%

Usage Scenarios for Byte-Addressable Persistent Memory in High-Performance and Data Intensive Computing

Weiland¹,

Homölle²

2021

J. Comput. Sci. Technol.

View full text Add to dashboard Cite

Byte-addressable persistent memory (B-APM) presents a new opportunity to bridge the performance gap between main memory and storage. In this paper, we present the usage scenarios for this new technology, based on the capabilities of Intel's DCPMM. We outline some of the basic performance characteristics of DCPMM, and explain how it can be configured and used to address the needs of memory and I/O intensive applications in the HPC (high-performance computing) and data intensive domains. Two decision trees are presented to advise on the configuration options for B-APM; their use is illustrated with two examples. We show that the flexibility of the technology has the potential to be truly disruptive, not only because of the performance improvements it can deliver, but also because it allows systems to cater for wider range of applications on homogeneous hardware.

show abstract

“…A relevant discussion and in-depth analysis on ephemeral systems has already been done by Brinkmann et al [5]; that article discusses the general ideas of ad-hoc file systems as well as the specific characteristics of three implementations: BeeOND [13], GekkoFS [37], and BurstFS [39].…”

Section: Ephemeral Systemsmentioning

confidence: 99%

“…There is a family of storage systems that tackle the interference issues and some of the aforementioned design challenges with a distinct approach: ephemeral storage systems, such as ad-hoc filesystems [5]. Ephemeral systems are designed to be executed next to the computation, as opposed to the more traditional approach of independent storage nodes (shared by multiple applications).…”

Section: Introductionmentioning

confidence: 99%

Revisiting Active Object Stores: Bringing Data Locality to the Limit With NVM

Barceló,

Queralt,

Cortes

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Object stores are widely used software stacks that achieve excellent scale-out with a well-defined interface and robust performance. However, their traditional get/put interface is unable to exploit data locality at its fullest, and limits reaching its peak performance. In particular, there is one way to improve data locality that has not yet achieved mainstream adoption: the active object store. Although there are some projects that have implemented the main idea of the active object store such as Swift's Storlets or Ceph Object Classes, the scope of these implementations is limited.We believe that there is a huge potential for active object stores in the current status quo. Hyper-converged nodes are bringing more computing capabilities to storage nodes -and viceversa. The proliferation of non-volatile memory (NVM) technology is blurring the line between system memory (fast and scarce) and block devices (slow and abundant). More and more applications need to manage a sheer amount of data (data analytics, Big Data, Machine Learning & AI, etc.), demanding bigger clusters and more complex computations. All these elements are potential game changers that need to be evaluated in the scope of active object stores.More specifically, having NVM devices presents additional opportunities, such as in-place execution. Being able to use the NVM from within the storage system while taking advantage of in-place execution (thanks to the byte-addressable nature of the NVM), in conjunction with the computing capabilities of hyper-converged nodes, can lead to active object stores that greatly outperform their non-active counterparts.In this article we propose an active object store software stack and evaluate it on an NVM-populated node. We will show how this setup is able to reduce execution times from 10% up to more than 90% in a variety of representative application scenarios. Our discussion will focus on the active aspect of the system as well as on the implications of the memory configuration.

show abstract

Ad Hoc File Systems for High-Performance Computing

Cited by 29 publications

References 66 publications

tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

Usage Scenarios for Byte-Addressable Persistent Memory in High-Performance and Data Intensive Computing

Revisiting Active Object Stores: Bringing Data Locality to the Limit With NVM

Contact Info

Product

Resources

About