Batch queue resource scheduling for workflow applications

Yang, Zhongzhen; Koelbel, Charles; Cooper, Keith D.

doi:10.1109/clustr.2009.5289186

Cited by 14 publications

(6 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, many heuristics for scheduling have been proposed to gain sub-optimal solutions to the problem. These heuristics can be either static with scheduling decisions being made prior to execution enabling the scheduler to make efficient scheduling decisions due to prior knowledge of the resources and application (Braun et al, 2001;Daoud and Kharma, 2008;Maheswaran et al, 1999b;Papazachos and Karatza, 2010), dynamic where scheduling decisions are made during the execution making it more challenging due to the dynamism involved in the resources as well as jobs (Maheswaran and Siegel, 1998;Maheswaran et al, 1999a;Wang et al, 2005), immediate mode in which the jobs are dispatched as soon as they are receive (Ruyan et al, 2009;Xhafa et al, 2007b) or the batch mode where first a group of jobs is formed followed by scheduling leading to collective scheduling decisions being made for better mapping of the jobs (Cai et al, 2002;Xhafa et al, 2007a;Leah et al, 2006;Ran and Han, 2009;Zhang et al, 2009). …”

Section: Related Workmentioning

confidence: 98%

“…Workflow computations are a major programming paradigm for scientific applications. A scheduling scheme has been proposed in Zhang et al (2009) that aggregates a workflow scheme and a batch queue to acquire resources for each sub component with an approach to reduce the wait time. Most of the work reported schedules the batch while considering only a few parameters leaving scope for the problem to be reconsidered using other appropriate parameters (Braun et al, 2001;Cai et al, 2002;Ran and Han, 2009;Ruyan et al, 2009) while in some other places in the literature mapping done to optimize even more than one parameter has also been reported (Xhafa et al, 2007a,b).…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Level based batch scheduling strategy with idle slot reduction under DAG constraints for computational grid

Shahid

Raza

Sajid

2015

Journal of Systems and Software

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 98%

Section: Related Workmentioning

confidence: 99%

Level based batch scheduling strategy with idle slot reduction under DAG constraints for computational grid

Shahid

Raza

Sajid

2015

Journal of Systems and Software

View full text Add to dashboard Cite

“…Some of the list scheduling heuristics for DAGs is also reported for generating better schedules like Heterogeneous Earliest Finish Time (HEFT), Critical path on Processor (CPOP) [12] and Levelized Min time (LMT) [13]. Further, some methods in which jobs represented as work flow application are partitioned into groups according to level and then scheduling done level wise with a focus to find optimal schedule at each depth level [13,14,15]. A Polynomial Time Approximation Schemes (PTAS) for bounded batch scheduling with the objective of minimizing the total completion time while considering the precedence constraints is also presented [16].…”

Section: Related Workmentioning

confidence: 99%

A precedence based load balancing strategy for batch of DAGs for computational grid

Shahid

Raza

2014

2014 International Conference on Contemporary Computing and Informatics (IC3I)

View full text Add to dashboard Cite

Load balancing on computational resources in a computational grid environment is an NP Hard problem. Therefore, a number of schemes can be proposed for the same under different constraints. Accordingly, a variety of techniques have been proposed in the literature but none of them can be treated as the best for all conditions and QoS parameters. A computational grid enables the users to execute their compute intensive jobs on the resources in which investing is either not wise or beyond their limit to invest. Grid resources can be used to meet the user's requirements in terms of QoS parameters if the scheduling strategy provides the efficient mapping between the software parallelisms available in the application on the hardware parallelism offered by the grid. This work presents a centralized precedence based load balancing strategy for a batch of jobs represented as Direct Acyclic Graph (DAG) having communication requirements. Performance evaluation is carried out by comparing the proposed strategy with other load balancing schemes viz. LRR and OLB. Experimental study reveals that load distribution on nodes is better than LRR and OLB. The strategy can be used for the jobs which are more interactive as it results in a lower response time for the jobs of the batch.

show abstract

“…Thus, for a perstate management to have lower makespans, the accumulated sum of its waiting times 𝑞 ′ 𝑖 has to be lower than the single waiting time 𝑞 1 in the 𝐵𝑖𝑔𝐽𝑜𝑏 strategy, i.e., 𝑠 𝑖=1 𝑞 ′ 𝑖 < 𝑞 1 . One strategy to achieve this is to heuristically pack multiple stages within medium-sized job submissions [31], though it may not achieve optimal resource usage. Finally, as the queue waiting time is a system parameter controlled by the resource manager, another natural strategy for the users is to observe its behaviour and estimate it.…”

Section: Scheduling Tradeoffs For Scientific Workflowsmentioning

confidence: 99%

ASA - The Adaptive Scheduling Architecture

Souza

Pelckmans

Ghoshal

et al. 2020

Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing

View full text Add to dashboard Cite

In High Performance Computing (HPC) infrastructures, resources are controlled by batch systems and may not be readily available, which can negatively impact applications with deadlines and long queue waiting times. In particular, this is noticeable for data intensive and low latency workflows where resource planning and timely allocation are key characteristics for efficient processing. On the one hand, allocating the maximum capacity expected for a scientific workflow guarantees the fastest possible execution time, at the cost of spare and idle infrastructural resources, as well as extended queue waiting times and costly resource usage. On the other hand, dynamically allocating resources according to specific workflow stage requirements optimizes resource usage, although it may also negatively impact the total workflow makespan. With the aim of enabling new scheduling strategies and features for scientific workflows, we propose ASA: the Adaptive Scheduling Architecture, a novel and convergence proven scheduling method to reduce perceived queue waiting times as well as to optimize resource usage and planning in scientific workflows. The algorithm uses reinforcement learning to estimate queue waiting times, and based on these estimates pro-actively submits resource change requests, with the goal of minimizing total workflow inter-stage waiting times, idle resources, and makespan. The algorithm takes into consideration both learning (the waiting times), and acts on what is learnt so far, and thus handles the exploration-exploitation trade-off. Experiments with real scientific workflows in two real supercomputers show that ASA combines the best of the two aforementioned approaches for resource allocation, with average workflows' queue waiting time and makespan reductions of up to 10% and 2% respectively, with up to 100% prediction accuracy, while obtaining near optimal resource utilization.

show abstract

Batch queue resource scheduling for workflow applications

Cited by 14 publications

References 11 publications

Level based batch scheduling strategy with idle slot reduction under DAG constraints for computational grid

Level based batch scheduling strategy with idle slot reduction under DAG constraints for computational grid

A precedence based load balancing strategy for batch of DAGs for computational grid

ASA - The Adaptive Scheduling Architecture

Contact Info

Product

Resources

About