Middleware support for many-task computing

Raicu, Ioan; Foster, Ian; Wilde, Mike; Zhang, Zhao; Iskra, Kamil; Beckman, Peter H.; Zhao, Yong; Szalay, A. S.; Choudhary, Alok; Little, Philip; Moretti, Christopher; Chaudhary, Amitabh; Thain, Douglas

doi:10.1007/s10586-010-0132-9

Cited by 38 publications

(18 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…FusionFS has already scaled to 1K nodes, and we aim to scale up FusionFS+HyCache to 10K nodes. We will also apply HyCache to Many-Task Computing (MTC) [31][32][33][34], which has specific emphasis on data-intensive computing [35] and cloud computing [36].…”

Section: Discussionmentioning

confidence: 99%

HyCache: A User-Level Caching Middleware for Distributed File Systems

Zhao

Raicu

2013

2013 IEEE International Symposium on Parallel &Amp; Distributed Processing, Workshops and PHD Forum

Self Cite

View full text Add to dashboard Cite

Abstract-One of the bottlenecks of distributed file systems deals with mechanical hard drives (HDD). Although solid-state drives (SSD) have been around since the 1990's, HDDs are still dominant due to large capacity and relatively low cost. Hybrid hard drives with a small built-in SSD cache does not meet the need of a large variety of workloads. This paper proposes a middleware that manages the underlying heterogeneous storage devices in order to allow distributed file systems to leverage the SSD performance while leveraging the capacity of HDD. We design and implement a user-level filesystem, HyCache, that can offer SSD-like performance at a cost similar to a HDD. We show how HyCache can be used to improve performance in distributed file systems, such as the Hadoop HDFS. Experiments show that HyCache achieves up to 7X higher throughput and 76X higher IOPS than Linux Ext4 file system, and can accelerate HDFS by 28% at 32-node scales (compared to vanilla HDFS).Index Terms-distributed file systems, user level file systems, hybrid file systems, heterogeneous storage, SSD

show abstract

Section: Discussionmentioning

confidence: 99%

HyCache: A User-Level Caching Middleware for Distributed File Systems

Zhao

Raicu

2013

2013 IEEE International Symposium on Parallel &Amp; Distributed Processing, Workshops and PHD Forum

Self Cite

View full text Add to dashboard Cite

show abstract

“…FusionFS is optimized for a subset of HPC and many-task computing (MTC) [12,59,62,63] workloads, and it is designed for extreme scales [61]. These workloads are often extremely data-intensive [56,58,60], and optimizing data locality [55] becomes critical to achieving good scalability and performance.…”

Section: A Fusionfs: Distributed Metadata Managementmentioning

confidence: 99%

ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table

Zhou

Brandstatter

et al. 2013

2013 IEEE 27th International Symposium on Parallel and Distributed Processing

Self Cite

116

View full text Add to dashboard Cite

Abstract-This paper presents ZHT, a zero-hop distributed hash table, which has been tuned for the requirements of high-end computing systems. ZHT aims to be a building block for future distributed systems, such as parallel and distributed file systems, distributed job management systems, and parallel programming systems. The goals of ZHT are delivering high availability, good fault tolerance, high throughput, and low latencies, at extreme scales of millions of nodes. ZHT has some important properties, such as being light-weight, dynamically allowing nodes join and leave, fault tolerant through replication, persistent, scalable, and supporting unconventional operations such as append (providing lock-free concurrent key/value modifications) in addition to insert/lookup/remove. We have evaluated ZHT's performance under a variety of systems, ranging from a Linux cluster with 512-cores, to an IBM Blue Gene/P supercomputer with 160K-cores. Using micro-benchmarks, we scaled ZHT up to 32K-cores with latencies of only 1.1ms and 18M operations/sec throughput. This work provides three real systems that have integrated with ZHT, and evaluate them at modest scales. 1) ZHT was used in the FusionFS distributed file system to deliver distributed meta-data management at over 60K operations (e.g. file create) per second at 2K-core scales. 2) ZHT was used in the IStore, an information dispersal algorithm enabled distributed object storage system, to manage chunk locations, delivering more than 500 chunks/sec at 32-nodes scales. 3) ZHT was also used as a building block to MATRIX, a distributed job scheduling system, delivering 5000 jobs/sec throughputs at 2K-core scales. We compared ZHT against other distributed hash tables and key/value stores and found it offers superior performance for the features and portability it supports.

show abstract

“…We investigated the largest available trace of real MTC workloads, collected over a 17-month period comprising of 173M tasks [38] [39]. We filtered out the logs to isolate only the 160K-core IBM Blue Gene/P Intrepid supercomputer from Argonne National Laboratory, which netted about 34.8M tasks with the minimum runtime of 0 seconds, maximum runtime of 1469.62 seconds, average runtime of 95.20 seconds, and standard deviation of 188.08.…”

Section: A Workload Tracementioning

confidence: 99%

Modeling Many-Task Computing Workloads on a Petaflop IBM Blue Gene/P Supercomputer

Wang

Raicu

2013

2013 IEEE International Symposium on Parallel &Amp; Distributed Processing, Workshops and PHD Forum

Self Cite

View full text Add to dashboard Cite

Abstract-Understanding the behavior of Bag-of-Tasks (BOT) is crucial for analyzing workflow-generated Many-Task Computing (MTC) workloads to aid in designing optimized job scheduling systems. Future job scheduling systems will need to be able to schedule large bags of tasks onto large-scale supercomputers and adaptive clouds with heterogeneous processors, I/O performance, and cost, all while minimizing job turn-around time and respecting the upper bound for the user-defined budget. Due to the strong periodicity and selfsimilarity during long time periods, BOTs have been shown to be an efficient approach for modeling High-Throughput Computing (HTC) workloads. However, applying the same analysis to MTC workloads poses significant challenges due to the significantly larger scale in terms of number of tasks, resource usage, and work granularity. In this paper, we extract two workloads from traces obtained from running MTC applications on a 40K-node IBM Blue Gene/P supercomputer and a 128-node Linux cluster. The traces span a 17-month period, cover 173M tasks, and have an average task runtime of 95 seconds. We propose methods to verify the existence of BOT arrival pattern, and ways to measure their impacts on system performance. We also examine the correlations among several BOT attributes, such as BOT size, runtime, CPU times, and inter-arrival time of BOT. The results show that the interarrival time of the two BOT workloads has Generalized Pareto (GP) distribution, and there are autocorrelations and crosscorrelations among the BOT attributes.

show abstract

Middleware support for many-task computing

Cited by 38 publications

References 38 publications

HyCache: A User-Level Caching Middleware for Distributed File Systems

HyCache: A User-Level Caching Middleware for Distributed File Systems

ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table

Modeling Many-Task Computing Workloads on a Petaflop IBM Blue Gene/P Supercomputer

Contact Info

Product

Resources

About