On the energy (in)efficiency of Hadoop clusters

Leverich, Jacob; Kozyrakis, Christos

doi:10.1145/1740390.1740405

Cited by 273 publications

(168 citation statements)

References 6 publications

Supporting

Mentioning

164

Contrasting

Unclassified

Order By: Relevance

“…For example, studies on how to predict MapReduce job running times [20], [21] can evaluate their mechanisms on realistic job mixes. Studies on MapReduce energy efficiency [22], [23] can quantify energy savings under realistic workload fluctuations. Various efforts to develop effective MapReduce workload management schemes [24], [7] can generalize their findings across a different realistic workloads.…”

Section: Towards Mapreduce Workload Suitesmentioning

confidence: 99%

The Case for Evaluating MapReduce Performance Using Workload Suites

Chen

Ganapathi

Griffith

et al. 2011

2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems

327

235

View full text Add to dashboard Cite

Abstract-MapReduce systems face enormous challenges due to increasing growth, diversity, and consolidation of the data and computation involved. Provisioning, configuring, and managing large-scale MapReduce clusters require realistic, workloadspecific performance insights that existing MapReduce benchmarks are ill-equipped to supply.In this paper, we build the case for going beyond benchmarks for MapReduce performance evaluations. We analyze and compare two production MapReduce traces to develop a vocabulary for describing MapReduce workloads. We show that existing benchmarks fail to capture rich workload characteristics observed in traces, and propose a framework to synthesize and execute representative workloads. We demonstrate that performance evaluations using realistic workloads gives cluster operator new ways to identify workload-specific resource bottlenecks, and workload-specific choice of MapReduce task schedulers.We expect that once available, workload suites would allow cluster operators to accomplish previously challenging tasks beyond what we can now imagine, thus serving as a useful tool to help design and manage MapReduce systems.

show abstract

Section: Towards Mapreduce Workload Suitesmentioning

confidence: 99%

The Case for Evaluating MapReduce Performance Using Workload Suites

Chen

Ganapathi

Griffith

et al. 2011

2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems

327

235

View full text Add to dashboard Cite

show abstract

“…Each possible solution has its own business tradeoffs. There is significant research work in progress in industry and academia to address this problem, but many challenges still remain [3,18,20,26].…”

Section: Other Issuesmentioning

confidence: 99%

Web-Scale Job Scheduling

Cirne

Frachtenberg

2013

Job Scheduling Strategies for Parallel Processing

View full text Add to dashboard Cite

Abstract. Web datacenters and clusters can be larger than the world's largest supercomputers, and run workloads that are at least as heterogeneous and complex as their high-performance computing counterparts. And yet little is known about the unique job scheduling challenges of these environments. This article aims to ameliorate this situation. It discusses the challenges of running web infrastructure and describes several techniques to address them. It also presents some of the problems that remain open in the field.

show abstract

“…The initial work on MapReduce cluster energy management was presented in [17] based on covering subset (CS). In that work, the CS nodes are manually determined, and one replica for each data item is then placed in one of the CS nodes.…”

Section: ) Mapreduce Cluster Energy Managementmentioning

confidence: 99%

“…definition in [17]. The CS used here is not a static node set, rather it is discovered on demand based on a given list of data blocks required for computation.…”

Section: Node Set Discovery Algorithmsmentioning

confidence: 99%

“…In a recent work [23], heterogeneity in a MapReduce cluster was considered for job scheduling and performance improvement. There are several recent research efforts dealing with energy management for MapReduce clusters [17], [16], but heterogeneity in such clusters has not been considered yet. In this paper, we examine how energy consumption can be further optimized by taking into account the different power requirements of the nodes in the cluster.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Energy Proportionality and Performance in Data Parallel Computing Clusters

Kim

Chou

Rotem

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract-Energy consumption in datacenters has recently become a major concern due to the rising operational costs and scalability issues. Recent solutions to this problem propose the principle of energy proportionality, i.e., the amount of energy consumed by the server nodes must be proportional to the amount of work performed. For data parallelism and fault tolerance purposes, most common file systems used in MapReduce-type clusters maintain a set of replicas for each data block. A covering subset is a group of nodes that together contain at least one replica of the data blocks needed for performing computing tasks. In this work, we develop and analyze algorithms to maintain energy proportionality by discovering a covering subset that minimizes energy consumption while placing the remaining nodes in lowpower standby mode. Our algorithms can also discover covering subset in heterogeneous computing environments. In order to allow more data parallelism, we generalize our algorithms so that it can discover k-covering subset, i.e., a set of nodes that contain at least k replicas of the data blocks. Our experimental results show that we can achieve substantial energy saving without significant performance loss in diverse cluster configurations and working environments.

show abstract

On the energy (in)efficiency of Hadoop clusters

Cited by 273 publications

References 6 publications

The Case for Evaluating MapReduce Performance Using Workload Suites

The Case for Evaluating MapReduce Performance Using Workload Suites

Web-Scale Job Scheduling

Energy Proportionality and Performance in Data Parallel Computing Clusters

Contact Info

Product

Resources

About