FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads

Wolf, Joel L.; Rajan, Deepta; Hildrum, Kirsten; Khandekar, Rohit; Kumar, V. Ravi; Parekh, Sujay; Wu, Kun-Lung; Balmin, Andrey

doi:10.1007/978-3-642-16955-7_1

Cited by 88 publications

(87 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Flex [38] is a size-based scheduler for Hadoop which is available as a proprietary commercial solution. In Flex, "fairness" is defined as avoiding job starvation and guaranteed by allocating a part of the cluster according to Hadoop's FAIR scheduler; size-based scheduling (without aging) is then performed only on the remaining set of nodes.…”

Section: Fairness and Qosmentioning

confidence: 99%

HFSP: Size-based scheduling for Hadoop

Pastorelli

Barbuzzi

Carra

et al. 2013

2013 IEEE International Conference on Big Data

View full text Add to dashboard Cite

Abstract-Size-based scheduling with aging has, for long, been recognized as an effective approach to guarantee fairness and near-optimal system response times. We present HFSP, a scheduler introducing this technique to a real, multi-server, complex and widely used system such as Hadoop.Size-based scheduling requires a priori job size information, which is not available in Hadoop: HFSP builds such knowledge by estimating it on-line during job execution.Our experiments, which are based on realistic workloads generated via a standard benchmarking suite, pinpoint at a significant decrease in system response times with respect to the widely used Hadoop Fair scheduler, and show that HFSP is largely tolerant to job size estimation errors.

show abstract

Section: Fairness and Qosmentioning

confidence: 99%

HFSP: Size-based scheduling for Hadoop

Pastorelli

Barbuzzi

Carra

et al. 2013

2013 IEEE International Conference on Big Data

View full text Add to dashboard Cite

show abstract

“…Flex [43] is a proprietary Hadoop size-based scheduler. In Flex, "fairness" is defined as avoiding job starvation and guaranteed by allocating a part of the cluster according to Hadoop's Fair scheduler; size-based scheduling (without aging) is then performed only on the remaining set of nodes.…”

Section: Related Workmentioning

confidence: 99%

HFSP: Bringing Size-Based Scheduling To Hadoop

Pastorelli¹,

Carra

Dell’Amico³

et al. 2017

IEEE Trans. Cloud Comput.

View full text Add to dashboard Cite

Abstract-Size-based scheduling with aging has been recognized as an effective approach to guarantee fairness and nearoptimal system response times. We present HFSP, a scheduler introducing this technique to a real, multi-server, complex and widely used system such as Hadoop.Size-based scheduling requires a priori job size information, which is not available in Hadoop: HFSP builds such knowledge by estimating it on-line during job execution.Our experiments, which are based on realistic workloads generated via a standard benchmarking suite, pinpoint at a significant decrease in system response times with respect to the widely used Hadoop Fair scheduler, without impacting the fairness of the scheduler, and show that HFSP is largely tolerant to job size estimation errors.

show abstract

“…Pietro Michiardi et al design a scheduler labelled FSP [23], which considers both fairness and efficiency rather than ours efficiency-only objective, and fairness-only scheduling as HFS [16], similar as Delay Scheduling [17] expect for the job-level resource provision like ours rather than task-level used in lots of current Hadoop schedulers, what's more, FSP permits preemption by job suspension. Joel Wolf et al propose a scheduling optimizer for MapReduce workloads with shared scans named as CIRCUMFLEX [24], which aims on optimizing concurrent jobs with share inputs, on the other hand, we assume jobs are totally independent, however, we will do this kind of optimization in future work. Hammoud et al propose center-of-gravity reduce task scheduling aiming to lower MapReduce network traffic [25], which model reduce input distribution as mass distribution model, by properly assign reduce tasks to save network cost, so we can call it data locality in reduce phase, which is not a consideration in our study but in future work.…”

Section: Related Workmentioning

confidence: 99%

An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters

Zhao

Yang

Fan

et al. 2013

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYScheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.

show abstract

FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads

Cited by 88 publications

References 12 publications

HFSP: Size-based scheduling for Hadoop

HFSP: Size-based scheduling for Hadoop

HFSP: Bringing Size-Based Scheduling To Hadoop

An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters

Contact Info

Product

Resources

About