Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization

Jones, James Patton; Nitzberg, Bill

doi:10.1007/3-540-47954-6_1

Cited by 72 publications

(49 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…As observed in various previous studies [16], the StartUp policy has the lowest utilizations, from 20% to 30%-in line to traditional provisioning policies that only look at peak workloads. As in previous studies of utilization [32], ODS, the commonly used policy in current data centers, achieves only a moderate utilization of 65% to 80%. Our portfolio scheduler combines consistently low job slowdown and wait time, with low cost (through high utilization).…”

Section: Results Of Synthetic Workloadsmentioning

confidence: 55%

A Periodic Portfolio Scheduler for Scientific Computing in the Data Center

Deng

Verboon

Ren

et al. 2014

Job Scheduling Strategies for Parallel Processing

View full text Add to dashboard Cite

Abstract. The popularity of data centers in scientific computing has led to new architectures, new workload structures, and growing customerbases. As a consequence, the selection of efficient scheduling algorithms for the data center is an increasingly costlier and more difficult challenge. To address this challenge, and contrasting previous work on scheduling for scientific workloads, we focus in this work on portfolio schedulinghere, the dynamic selection and use of a scheduling policy, depending on the current system and workload conditions, from a portfolio of multiple policies. We design a periodic portfolio scheduler for the workload of the entire data center, and equip it with a portfolio of resource provisioning and allocation policies. Through simulation based on real and synthetic workload traces, we show evidence that portfolio scheduling can automatically select the scheduling policy to match both user and data center objectives, and that portfolio scheduling can perform well in the data center, relative to its constituent policies.

show abstract

Section: Results Of Synthetic Workloadsmentioning

confidence: 55%

A Periodic Portfolio Scheduler for Scientific Computing in the Data Center

Deng

Verboon

Ren

et al. 2014

Job Scheduling Strategies for Parallel Processing

View full text Add to dashboard Cite

show abstract

“…This suggests that at any given moment some fraction of the usable nodes will be sitting idle, even when jobs are waiting to run in the queue. In fact, studies have shown that FCFS only manages to achieve around 40-60% node utilization, while EASY does somewhat better at around 70% node utilization [32].…”

Section: Batch Schedulingmentioning

confidence: 99%

Dynamic Fractional Resource Scheduling for cluster platforms

Stillwell

2010

2010 IEEE International Symposium on Parallel &Amp; Distributed Processing, Workshops and PHD Forum (IPDPSW)

View full text Add to dashboard Cite

Experiments presented in this paper were carried out using the Grid'5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr).The experiments using our prototype system were carried out on machines owned by the Deparment of Computer Science and Engineering at the University of California, San Diego with the permission of Geoffrey Voelker. System administration and much assistance were provided by Gjergji Zyba. iv ABSTRACTThis research focuses on the problem of job scheduling on homogeneous computational clusters. Clusters are widely used today for a variety of purposes, including high-performance scientific computing and Internet service hosting. While clusters may have impressive aggregate performance metrics, they are really only collections of fairly modest machines, which makes scheduling jobs for the best performance a non-trivial problem. Most clusters also need to be shared among users to amortize their start-up and maintenance costs, and ensuring that these users are treated fairly further adds to the difficulty. Existing approaches to scheduling attempt to address both of these issues, but have several limitations.We propose a novel approach, called Dynamic Fractional Resource Scheduling (DFRS), to sharing homogeneous cluster computing platforms among competing jobs.The key features of DFRS are that it leverages existing virtual machine technology in order to share resources more efficiently and it defines and optimizes a user-centric metric that captures notions of both performance and fairness. In this dissertation we explain the principles behind DFRS and its advantages over the current state of the art, develop a theoretical model of resource sharing, design heuristics to optimize the proposed metric within the given framework, implement and run simulations comparing DFRS to traditional approaches using popular and accepted performance metrics, and finally develop and test a prototype implementation based on existing technologies. Our results show that it is possible to develop heuristic algorithms that give results reasonably close to theoretical bounds for a variety of cases, that resource requirements are well within the capabilities of modern systems, and that for some scenarios DFRS can provide orders-ofmagnitude levels of improvement in performance over current approaches.

show abstract

“…This problem is solved by backfilling algorithms, which allow small jobs from the back of the queue to execute before larger jobs that arrived earlier, thus utilizing the idle processors, while the latter are waiting for enough processors to be freed [15]. Backfilling is known to greatly increase user satisfaction since small jobs tend to get through faster, while bypassing large ones [11,2]. Note that backfilling algorithms require the jobs' runtimes to be known in advance.…”

Section: Scheduling With Backfillingmentioning

confidence: 99%

“…Dynamic backfilling allows the scheduler to overrule a previous reservation if introducing a slight delay will improve utilization considerably [11]. Talby and Feitelson presented slack based backfilling, an enhanced backfill scheduler that supports priorities [26].…”

Section: Scheduling With Backfillingmentioning

confidence: 99%

Backfilling with lookahead to optimize the packing of parallel jobs

Shmueli

Feitelson

2005

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

The utilization of parallel computers depends on how jobs are packed together: if the jobs are not packed tightly, resources are lost due to fragmentation. The problem is that the goal of high utilization may conflict with goals of fairness or even progress for all jobs. The common solution is to use backfilling, which combines a reservation for the first job in the interest of progress with packing of later jobs to fill in holes and increase utilization. However, backfilling considers the queued jobs one at a time, and thus might miss better packing opportunities. We propose the use of dynamic programming to find the best packing possible given the current composition of the queue, thus maximizing the utilization on every scheduling step. Simulations of this algorithm, called LOS (Lookahead Optimizing Scheduler), using trace files from several IBM SP parallel systems, show that LOS indeed improves utilization, and thereby reduces the mean response time and mean slowdown of all jobs. Moreover, it is actually possible to limit the lookahead depth to about 50 jobs and still achieve essentially the same results. Finally, we experimented with selecting among alternative sets of jobs that achieve the same utilization. Surprising results indicate that choosing the set at the head of the queue does not necessarily guarantee best performance. Instead, repeatedly selecting the set with the maximal overall expected slowdown boosts performance when compared to all other alternatives checked.

show abstract

Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization

Cited by 72 publications

References 5 publications

A Periodic Portfolio Scheduler for Scientific Computing in the Data Center

A Periodic Portfolio Scheduler for Scientific Computing in the Data Center

Dynamic Fractional Resource Scheduling for cluster platforms

Backfilling with lookahead to optimize the packing of parallel jobs

Contact Info

Product

Resources

About