2014
DOI: 10.1007/s11227-014-1335-2
|View full text |Cite
|
Sign up to set email alerts
|

An optimized MapReduce workflow scheduling algorithm for heterogeneous computing

Abstract: The MapReduce framework is considered to be an effective resolution for huge and parallel data processing. This paper treats a massive data processing workflow as a DAG graph consisting of MapReduce jobs. In a heterogeneous computing environment, the computation speed can be different even on the same slot depending on various jobs. For this problem, this paper proposes an optimized MapReduce workflow scheduling algorithm. This algorithm comprises a job prioritizing phase and a task assignment phase. First, th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(17 citation statements)
references
References 24 publications
0
17
0
Order By: Relevance
“…If the estimated makespan exceeds deadline, then the next cheapest VM type ( ) is selected until the estimated execution time of the workflow on is lower than . The elapsed time between the deadline and the estimated makespan, denoted as the available spare time is calculated according to Equation (11). is distributed proportionally over all levels of the workflow on the basis of runtime of tasks according to Equation (12).…”
Section: Deadline Distributionmentioning
confidence: 99%
See 1 more Smart Citation
“…If the estimated makespan exceeds deadline, then the next cheapest VM type ( ) is selected until the estimated execution time of the workflow on is lower than . The elapsed time between the deadline and the estimated makespan, denoted as the available spare time is calculated according to Equation (11). is distributed proportionally over all levels of the workflow on the basis of runtime of tasks according to Equation (12).…”
Section: Deadline Distributionmentioning
confidence: 99%
“…For example, dynamic provisioning of resources is not considered in [5][6][7][8], scalability in terms of large number of tasks is not considered in [9], heterogeneity of resources is not considered in [8,10], resource auto-scaling is not considered in [11], data dependencies are not considered in [12] and task clustering technique in [5] is not fully autonomous. Moreover, unlike multiple independent BoTs or single task-based workflows, the concept of using multiple connected and constrained BoTs for reducing the data transfer time is not considered in most existing scheduling algorithms [13,14].…”
Section: Introductionmentioning
confidence: 99%
“…The default Hadoop scheduling scheme supports simple scheduling approaches such as first-come-first-serve and fair scheduling. A number of studies have been conducted [16][17][18][19][49][50][51] to improve Hadoop performance from different aspects.…”
Section: Related Workmentioning
confidence: 99%
“…worked out the comparision of the mapreduce algorithms by exploring the merits and demerits. Zhuo Tang in paper [3] explored an idea of optimizing the scheduling in a heterogeneous cluster in two phases(job prioritizing phase and task assignment phase).The authors gave a new dimension of scheduling the jobs based on the category of I/O intensive and Compute intensive. Then the jobs are allotted to the machines based on data locality.…”
Section: Previous Workmentioning
confidence: 99%