An optimized MapReduce workflow scheduling algorithm for heterogeneous computing

Liu, Min; Al-Moalmi, Ammar; Li, Kenli; Li, Keqin

doi:10.1007/s11227-014-1335-2

Cited by 27 publications

(17 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If the estimated makespan exceeds deadline, then the next cheapest VM type ( ) is selected until the estimated execution time of the workflow on is lower than . The elapsed time between the deadline and the estimated makespan, denoted as the available spare time is calculated according to Equation (11). is distributed proportionally over all levels of the workflow on the basis of runtime of tasks according to Equation (12).…”

Section: Deadline Distributionmentioning

confidence: 99%

“…For example, dynamic provisioning of resources is not considered in [5][6][7][8], scalability in terms of large number of tasks is not considered in [9], heterogeneity of resources is not considered in [8,10], resource auto-scaling is not considered in [11], data dependencies are not considered in [12] and task clustering technique in [5] is not fully autonomous. Moreover, unlike multiple independent BoTs or single task-based workflows, the concept of using multiple connected and constrained BoTs for reducing the data transfer time is not considered in most existing scheduling algorithms [13,14].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Elastic Scheduling of Scientific Workflows under Deadline Constraints in Cloud Computing Environments

Anwar

Deng

2018

Future Internet

View full text Add to dashboard Cite

Scientific workflow applications are collections of several structured activities and finegrained computational tasks. Scientific workflow scheduling in cloud computing is a challenging research topic due to its distinctive features. In cloud environments, it has become critical to perform efficient task scheduling resulting in reduced scheduling overhead, minimized cost and maximized resource utilization while still meeting the user-specified overall deadline. This paper proposes a strategy, Dynamic Scheduling of Bag of Tasks based workflows (DSB), for scheduling scientific workflows with the aim to minimize financial cost of leasing Virtual Machines (VMs) under a userdefined deadline constraint. The proposed model groups the workflow into Bag of Tasks (BoTs) based on data dependency and priority constraints and thereafter optimizes the allocation and scheduling of BoTs on elastic, heterogeneous and dynamically provisioned cloud resources called VMs in order to attain the proposed method's objectives. The proposed approach considers pay-asyou-go Infrastructure as a Service (IaaS) clouds having inherent features such as elasticity, abundance, heterogeneity and VM provisioning delays. A trace-based simulation using benchmark scientific workflows representing real world applications, demonstrates a significant reduction in workflow computation cost while the workflow deadline is met. The results validate that the proposed model produces better success rates to meet deadlines and cost efficiencies in comparison to adapted state-of-the-art algorithms for similar problems.

show abstract

Section: Deadline Distributionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Elastic Scheduling of Scientific Workflows under Deadline Constraints in Cloud Computing Environments

Anwar

Deng

2018

Future Internet

View full text Add to dashboard Cite

show abstract

“…The default Hadoop scheduling scheme supports simple scheduling approaches such as first-come-first-serve and fair scheduling. A number of studies have been conducted [16][17][18][19][49][50][51] to improve Hadoop performance from different aspects.…”

Section: Related Workmentioning

confidence: 99%

gSched: a resource aware Hadoop scheduler for heterogeneous cloud computing environments

Caruana

et al. 2016

Concurrency and Computation

View full text Add to dashboard Cite

Summary MapReduce has become a major programming model for data‐intensive applications in cloud computing environments. Hadoop, an open source implementation of MapReduce, has been adopted by an increasingly wide user community. However, Hadoop suffers from task scheduling performance degradation in heterogeneous contexts because of its homogeneous design focus. This paper presents gSched, a resource‐aware Hadoop scheduler that takes into account both the heterogeneity of computing resources and provisioning charges in task allocation in cloud computing environments. gSched is initially evaluated in an experimental Hadoop cluster and demonstrates enhanced performance compared with the default Hadoop scheduler. Further evaluations are conducted on the Amazon EC2 cloud that demonstrates the effectiveness of gSched in task allocation in heterogeneous cloud computing environments. Copyright © 2016 John Wiley & Sons, Ltd.

show abstract

“…worked out the comparision of the mapreduce algorithms by exploring the merits and demerits. Zhuo Tang in paper [3] explored an idea of optimizing the scheduling in a heterogeneous cluster in two phases(job prioritizing phase and task assignment phase).The authors gave a new dimension of scheduling the jobs based on the category of I/O intensive and Compute intensive. Then the jobs are allotted to the machines based on data locality.…”

Section: Previous Workmentioning

confidence: 99%

Agerl Based Enhanced Map Reduce Technique in Cloud Scheduling

S¹,

Kalaavathi²

2016

IJCSE

View full text Add to dashboard Cite

Today's real time big data applications mostly rely on map-reduce (M-R) framework of Hadoop File System (HDFS). Hadoop makes the complexity of such applications in a simpler manner. This paper works on two goals: maximizing resource utilization and reducing the overall job completion time. Based on the goals proposed, we have developed Agent Centric Enhanced Reinforcement Learning Algorithm (AGERL) .The algorithm concentrates in four dimensions: variable partitioning of tasks, calculation of progress ratio of processing tasks including delays, XMPP based multi attribute query posting and Hopkins statistics assessment based dynamic cluster restructuring. An Enhanced Reinforcement Learning Process with the above features is employed to achieve the proposed goal. Finally performance gain is theoretically proved.

show abstract

An optimized MapReduce workflow scheduling algorithm for heterogeneous computing

Cited by 27 publications

References 24 publications

Elastic Scheduling of Scientific Workflows under Deadline Constraints in Cloud Computing Environments

Elastic Scheduling of Scientific Workflows under Deadline Constraints in Cloud Computing Environments

gSched: a resource aware Hadoop scheduler for heterogeneous cloud computing environments

Agerl Based Enhanced Map Reduce Technique in Cloud Scheduling

Contact Info

Product

Resources

About