2013
DOI: 10.1016/j.peva.2013.08.013
|View full text |Cite
|
Sign up to set email alerts
|

Joint optimization of overlapping phases in MapReduce

Abstract: MapReduce is a scalable parallel computing framework for big data processing. It exhibits multiple processing phases, and thus an efficient job scheduling mechanism is crucial for ensuring efficient resource utilization. This paper studies the scheduling challenge that results from the overlapping of the "map" and "shuffle" phases in MapReduce. We propose a new, general model for this scheduling problem, and validate this model using cluster experiments. Further, we prove that scheduling to minimize average re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 60 publications
(10 citation statements)
references
References 29 publications
0
10
0
Order By: Relevance
“…In this context, one of the main challenges [25,33] is that the execution time of a MapReduce job is generally unknown in advance. Because of this, predicting the execution time of Hadoop jobs is usually done empirically through experimentation, requiring a costly setup [15].…”
Section: Introductionmentioning
confidence: 99%
“…In this context, one of the main challenges [25,33] is that the execution time of a MapReduce job is generally unknown in advance. Because of this, predicting the execution time of Hadoop jobs is usually done empirically through experimentation, requiring a costly setup [15].…”
Section: Introductionmentioning
confidence: 99%
“…The work in [17] models the execution of Map task through a tandem queue with overlapping phases and provides very efficient run time scheduling solutions for the joint optimization of the Map and copy/shuffle phases. Authors show how their runtime scheduling algorithms match closely the performance of the offline optimal version.…”
Section: Related Workmentioning
confidence: 99%
“…Nevertheless, MapReduce applications have evolved and it is not uncommon that large queries, submitted by different user classes, need to be performed on shared clusters, possibly with some guarantees on their execution time. In this context the main drawback [17,26] is that the execution time of a MapReduce job is generally unknown in advance. In such systems, capacity allocation becomes one of the most important aspects.…”
Section: Introductionmentioning
confidence: 99%
“…In this context, one of the main challenges [17,24] is that the execution time of a MapReduce job is generally unknown in advance. Because of this, predicting the execution time of Hadoop jobs is usually done empirically through experimentation, requiring a costly setup [10].…”
Section: Introductionmentioning
confidence: 99%