Joint optimization of overlapping phases in MapReduce

Lin, Mei; Zhang, Li; Wierman, Adam; Tan, Jian

doi:10.1016/j.peva.2013.08.013

Cited by 60 publications

(10 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this context, one of the main challenges [25,33] is that the execution time of a MapReduce job is generally unknown in advance. Because of this, predicting the execution time of Hadoop jobs is usually done empirically through experimentation, requiring a costly setup [15].…”

Section: Introductionmentioning

confidence: 99%

Modeling Performance of Hadoop Applications: A Journey from Queueing Networks to Stochastic Well Formed Nets

Ardagna

Bernardi

Gianniti

et al. 2016

Algorithms and Architectures for Parallel Processing

View full text Add to dashboard Cite

Nowadays, many enterprises commit to the extraction of actionable knowledge from huge datasets as part of their core business activities. Applications belong to very different domains such as fraud detection or one-to-one marketing, and encompass business analytics and support to decision making in both private and public sectors. In these scenarios, a central place is held by the MapReduce framework and in particular its open source implementation, Apache Hadoop. In such environments, new challenges arise in the area of jobs performance prediction, with the needs to provide Service Level Agreement guarantees to the end-user and to avoid waste of computational resources. In this paper we provide performance analysis models to estimate MapReduce job execution times in * Acknowledgments: This work has received funding from the European Union Horizon 2020 research and innovation program under grant agreement No. 644869 (DICE). Experimental data are available as open data at https://zenodo.org/record/58847#.V5i0wmXA45Q. 1Hadoop clusters governed by the YARN Capacity Scheduler. We propose models of increasing complexity and accuracy, ranging from queueing networks to stochastic well formed nets, able to estimate job performance under a number of scenarios of interest, including also unreliable resources. The accuracy of our models is evaluated by considering the TPC-DS industry benchmark running experiments on Amazon EC2 and the CINECA Italian supercomputing center. The results have shown that the average accuracy we can achieve is in the range 9-14%.

show abstract

Section: Introductionmentioning

confidence: 99%

Modeling Performance of Hadoop Applications: A Journey from Queueing Networks to Stochastic Well Formed Nets

Ardagna

Bernardi

Gianniti

et al. 2016

Algorithms and Architectures for Parallel Processing

View full text Add to dashboard Cite

show abstract

“…The work in [17] models the execution of Map task through a tandem queue with overlapping phases and provides very efficient run time scheduling solutions for the joint optimization of the Map and copy/shuffle phases. Authors show how their runtime scheduling algorithms match closely the performance of the offline optimal version.…”

Section: Related Workmentioning

confidence: 99%

“…Nevertheless, MapReduce applications have evolved and it is not uncommon that large queries, submitted by different user classes, need to be performed on shared clusters, possibly with some guarantees on their execution time. In this context the main drawback [17,26] is that the execution time of a MapReduce job is generally unknown in advance. In such systems, capacity allocation becomes one of the most important aspects.…”

Section: Introductionmentioning

confidence: 99%

Optimal Map Reduce Job Capacity Allocation in Cloud Systems

Malekimajd

Ardagna

Ciavotta

et al. 2015

SIGMETRICS Perform. Eval. Rev.

View full text Add to dashboard Cite

We are entering a Big Data world. Many sectors of our economy are now guided by data-driven decision processes. Big Data and business intelligence applications are facilitated by the MapReduce programming model while, at infrastructural layer, cloud computing provides flexible and cost effective solutions for allocating on demand large clusters. Capacity allocation in such systems is a key challenge to provide performance for MapReduce jobs and minimize cloud resource costs. The contribution of this paper is twofold: (i) we provide new upper and lower bounds for MapReduce job execution time in shared Hadoop clusters, (ii) we formulate a linear programming model able to minimize cloud resources costs and job rejection penalties for the execution of jobs of multiple classes with (soft) deadline guarantees. Simulation results show how the execution time of MapReduce jobs falls within 14% of our upper bound on average. Moreover, numerical analyses demonstrate that our method is able to determine the global optimal solution of the linear problem for systems including up to 1,000 user classes in less than 0.5 seconds.

show abstract

“…In this context, one of the main challenges [17,24] is that the execution time of a MapReduce job is generally unknown in advance. Because of this, predicting the execution time of Hadoop jobs is usually done empirically through experimentation, requiring a costly setup [10].…”

Section: Introductionmentioning

confidence: 99%

Fluid Petri Nets for the Performance Evaluation of MapReduce Applications

Gianniti

Rizzi

Barbierato

et al. 2017

Proceedings of the 10th EAI International Conference on Performance Evaluation Methodologies and Tools

View full text Add to dashboard Cite

Big Data applications allow to successfully analyze large amounts of data not necessarily structured, though at the same time they present new challenges. For example, predicting the performance of frameworks such as Hadoop can be a costly task, hence the necessity to provide models that can be a valuable support for designers and developers. This paper provides a new contribution in studying a novel modeling approach based on fluid Petri nets to predict MapReduce jobs execution time.The experiments we performed at CINECA, the Italian supercomputing center, have shown that the achieved accuracy is within 16% of the actual measurements on average.

show abstract

Joint optimization of overlapping phases in MapReduce

Cited by 60 publications

References 29 publications

Modeling Performance of Hadoop Applications: A Journey from Queueing Networks to Stochastic Well Formed Nets

Modeling Performance of Hadoop Applications: A Journey from Queueing Networks to Stochastic Well Formed Nets

Optimal Map Reduce Job Capacity Allocation in Cloud Systems

Fluid Petri Nets for the Performance Evaluation of MapReduce Applications

Contact Info

Product

Resources

About