2009 Eighth International Conference on Grid and Cooperative Computing 2009
DOI: 10.1109/gcc.2009.19
|View full text |Cite
|
Sign up to set email alerts
|

A Dynamic MapReduce Scheduler for Heterogeneous Workloads

Abstract: MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a practical data center of that scale, it is a common case that I/Obound jobs and CPU-bound jobs, which demand different resources, run simultaneously in the same cluster. In the MapReduce framework, parallelization of these two kinds of job has not been concerned. In this paper, we give a new view of the MapReduce model, and classify the MapReduce workloads into three categories based on their CPU an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
49
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 125 publications
(49 citation statements)
references
References 12 publications
0
49
0
Order By: Relevance
“…The approach of SkewTune [39] greatly mitigates the issue of skew in task processing times with a plug-in module that seamlessly integrates in Hadoop, which can be used in conjunction with HFSP. Tian et al [13] propose a mechanism where IO-bound and CPU-bound jobs run concurrently, benefitting from the absence of conflicts on resources between them. We remark that also in this case it is possible to benefit from size-based scheduling, as it can be applied separately on the IO-and CPU-bound queues.…”
Section: Fairness and Qosmentioning
confidence: 99%
See 1 more Smart Citation
“…The approach of SkewTune [39] greatly mitigates the issue of skew in task processing times with a plug-in module that seamlessly integrates in Hadoop, which can be used in conjunction with HFSP. Tian et al [13] propose a mechanism where IO-bound and CPU-bound jobs run concurrently, benefitting from the absence of conflicts on resources between them. We remark that also in this case it is possible to benefit from size-based scheduling, as it can be applied separately on the IO-and CPU-bound queues.…”
Section: Fairness and Qosmentioning
confidence: 99%
“…These sizes are difficult to obtain a priori, even though various recent works tackle the task of estimating MapReduce job sizes [9]- [13] (we discuss them in more detail in Section V); in addition, Lu et al evaluate the impact of estimation errors on sizebased scheduling for synthetic traces [14]. Unfortunately, the combination of these works is not sufficient to understand which level of estimation errors would be acceptable for sizebased scheduling in our context of extremely diverse job sizes.…”
Section: B Impact Of Size Estimation Errorsmentioning
confidence: 99%
“…In the first category, we find works like Kumar et al (2012), Tian et al (2009) or Rasooli and Down (2012). These works assume that most jobs are periodic and demand similar CPU, network and disk usage characteristics.…”
Section: Context-awarenessmentioning
confidence: 99%
“…The approach of SkewTune [48] greatly mitigates the issue of skew in task processing times with a plug-in module that seamlessly integrates in Hadoop, which can be used in conjunction with HFSP. Tian et al [49] propose a mechanism where IO-bound and CPU-bound jobs run concurrently, benefitting from the absence of conflicts on resources between them. We remark that also in this case it is possible to benefit from size-based scheduling, as it can be applied separately on the IO-and CPU-bound queues.…”
Section: Related Workmentioning
confidence: 99%