A Dynamic MapReduce Scheduler for Heterogeneous Workloads

Tian, Chao; Zhou, Haojie; He, Yongqiang; Zha, Li

doi:10.1109/gcc.2009.19

Cited by 125 publications

(49 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The approach of SkewTune [39] greatly mitigates the issue of skew in task processing times with a plug-in module that seamlessly integrates in Hadoop, which can be used in conjunction with HFSP. Tian et al [13] propose a mechanism where IO-bound and CPU-bound jobs run concurrently, benefitting from the absence of conflicts on resources between them. We remark that also in this case it is possible to benefit from size-based scheduling, as it can be applied separately on the IO-and CPU-bound queues.…”

Section: Fairness and Qosmentioning

confidence: 99%

“…These sizes are difficult to obtain a priori, even though various recent works tackle the task of estimating MapReduce job sizes [9]- [13] (we discuss them in more detail in Section V); in addition, Lu et al evaluate the impact of estimation errors on sizebased scheduling for synthetic traces [14]. Unfortunately, the combination of these works is not sufficient to understand which level of estimation errors would be acceptable for sizebased scheduling in our context of extremely diverse job sizes.…”

Section: B Impact Of Size Estimation Errorsmentioning

confidence: 99%

See 1 more Smart Citation

HFSP: Size-based scheduling for Hadoop

Pastorelli

Barbuzzi

Carra

et al. 2013

2013 IEEE International Conference on Big Data

View full text Add to dashboard Cite

Abstract-Size-based scheduling with aging has, for long, been recognized as an effective approach to guarantee fairness and near-optimal system response times. We present HFSP, a scheduler introducing this technique to a real, multi-server, complex and widely used system such as Hadoop.Size-based scheduling requires a priori job size information, which is not available in Hadoop: HFSP builds such knowledge by estimating it on-line during job execution.Our experiments, which are based on realistic workloads generated via a standard benchmarking suite, pinpoint at a significant decrease in system response times with respect to the widely used Hadoop Fair scheduler, and show that HFSP is largely tolerant to job size estimation errors.

show abstract

Section: Fairness and Qosmentioning

confidence: 99%

Section: B Impact Of Size Estimation Errorsmentioning

confidence: 99%

HFSP: Size-based scheduling for Hadoop

Pastorelli

Barbuzzi

Carra

et al. 2013

2013 IEEE International Conference on Big Data

View full text Add to dashboard Cite

show abstract

“…In the first category, we find works like Kumar et al (2012), Tian et al (2009) or Rasooli and Down (2012). These works assume that most jobs are periodic and demand similar CPU, network and disk usage characteristics.…”

Section: Context-awarenessmentioning

confidence: 99%

Mapreduce Challenges on Pervasive Grids

Steffenel¹,

Flauzac²,

Charão³

et al. 2014

Journal of Computer Science

View full text Add to dashboard Cite

This study presents the advances on designing and implementing scalable techniques to support the development and execution of MapReduce application in pervasive distributed computing infrastructures, in the context of the PER-MARE project. A pervasive framework for MapReduce applications is very useful in practice, especially in those scientific, enterprises and educational centers which have many unused or underused computing resources, which can be fully exploited to solve relevant problems that demand large computing power, such as scientific computing applications, big data processing, etc. In this study, we propose the study of multiple techniques to support volatility and heterogeneity on MapReduce, by applying two complementary approaches: Improving the Apache Hadoop middleware by including context-awareness and fault-tolerance features; and providing an alternative pervasive grid implementation, fully adapted to dynamic environments. The main design and implementation decisions for both alternatives are described and validated through experiments, demonstrating that our approaches provide high reliability when executing on pervasive environments. The analysis of the experiments also leads to several insights on the requirements and constraints from dynamic and volatile systems, reinforcing the importance of context-aware information and advanced fault-tolerance features to provide efficient and reliable MapReduce services on pervasive grids.

show abstract

“…The approach of SkewTune [48] greatly mitigates the issue of skew in task processing times with a plug-in module that seamlessly integrates in Hadoop, which can be used in conjunction with HFSP. Tian et al [49] propose a mechanism where IO-bound and CPU-bound jobs run concurrently, benefitting from the absence of conflicts on resources between them. We remark that also in this case it is possible to benefit from size-based scheduling, as it can be applied separately on the IO-and CPU-bound queues.…”

Section: Related Workmentioning

confidence: 99%

HFSP: Bringing Size-Based Scheduling To Hadoop

Pastorelli¹,

Carra

Dell’Amico³

et al. 2017

IEEE Trans. Cloud Comput.

View full text Add to dashboard Cite

Abstract-Size-based scheduling with aging has been recognized as an effective approach to guarantee fairness and nearoptimal system response times. We present HFSP, a scheduler introducing this technique to a real, multi-server, complex and widely used system such as Hadoop.Size-based scheduling requires a priori job size information, which is not available in Hadoop: HFSP builds such knowledge by estimating it on-line during job execution.Our experiments, which are based on realistic workloads generated via a standard benchmarking suite, pinpoint at a significant decrease in system response times with respect to the widely used Hadoop Fair scheduler, without impacting the fairness of the scheduler, and show that HFSP is largely tolerant to job size estimation errors.

show abstract

A Dynamic MapReduce Scheduler for Heterogeneous Workloads

Cited by 125 publications

References 12 publications

HFSP: Size-based scheduling for Hadoop

HFSP: Size-based scheduling for Hadoop

Mapreduce Challenges on Pervasive Grids

HFSP: Bringing Size-Based Scheduling To Hadoop

Contact Info

Product

Resources

About