A traditional application scheduler running on a parallel cluster only supports static scheduling where the number of processors allocated to an application remains fixed throughout the lifetime of the job. Due to unpredictability in job arrival times and varying resource requirements, static scheduling can result in idle system resources thereby decreasing the overall system throughput. In this paper we present a prototype framework called ReSHAPE, which supports dynamic resizing of parallel MPI applications executed on distributed memory platforms. The framework includes a scheduler that supports resizing of applications, an API to enable applications to interact with the scheduler, and a library that makes resizing viable. Applications executed using the ReSHAPE scheduler framework can expand to take advantage of additional free processors or can shrink to accommodate a high priority application, without getting suspended. Experimental results show that the Re-SHAPE framework can improve individual job turn-around time and overall system throughput.
Most conventional parallel job schedulers only support static scheduling thereby restricting schedulers from being able to modify the number of processors allocated to parallel applications at runtime. The drawbacks of static scheduling can be overcome by using scheduling policies that can exploit dynamic resizability in distributed-memory parallel applications and a scheduler that supports these policies. The scheduler must be capable of adding and removing processors from a parallel application at runtime. This ability of a scheduler to resize parallel applications increases the possibilities for parallel schedulers to manage a large cluster. Our ReSHAPE framework includes an application scheduler that supports dynamic resizing of parallel applications. In this paper, we illustrate the impact of dynamic resizability on parallel scheduling. We propose and evaluate new scheduling policies made possible by our ReSHAPE framework. Experimental results show that these scheduling policies significantly improve individual application turn around time as well as overall cluster utilization.
Large-scale computational science simulations are a dominant component of the workload on modern supercomputers. Efficient use of high-end resources for these large computations is of considerable scientific and economic importance. However, conventional job schedulers limit flexibility in that they are 'static', i.e., the number of processors allocated to an application can not be changed at runtime. In earlier work, we described ReSHAPE a system that eliminates this drawback by supporting dynamic resizability in distributed-memory parallel applications. The goal of this paper is to present a case study highlighting the steps involved in adapting a production scientific simulation code to take advantage of ReSHAPE. LAMMPS, a widely used molecular dynamics code, is the test case. Minor extensions to LAMMPS allow it to be resized using ReSHAPE, and experimental results show that resizing significantly improves overall system utilization as well as performance of an individual LAMMPS job.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.