Job migration in HPC clusters by means of checkpoint/restart

Rodríguez-Pascual, Manuel; Cao, Jiajun; Moríñigo, José A.; Cooperman, Gene; Mayo-García, Rafael

doi:10.1007/s11227-019-02857-y

Cited by 13 publications

(12 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Niu et al [30] have also shown an example of how checkpointing along with preemptive scheduling can increase the performance of the backfill algorithm, and in this work we follow this approach in practice. To the authors' knowledge, aside from [31], this has never been done before.…”

Section: B Scheduling Methods In Slurmmentioning

confidence: 98%

“…A recent development of the SLURM scheduler is the seamless incorporation of the Distributed MultiThreaded Checkpointing (DMTCP) [13] library, enabling it to transparently Checkpoint and Restart (C/R) a single-host, parallel or distributed computation. It does so in user-space, with no modifications to user code or the operating system, supporting a variety of HPC languages and infrastructures, including MPI and OpenMP [31].…”

Section: The Optimized Memoryless Fair-sharementioning

confidence: 99%

“…We follow Rodriguez et al [31] in basing our work on SLURM because it is the only scheduler which supports C/R in the form of an API. DMTCP is a system-level C/R library which allows to perform C/R operations without any source code modifications; it supports SysV enhancements such as System V shared memory [2] which many MPI implementations employ; and it supports InfiniBand [16].…”

Section: A Technical Scheduler and Transparent C/rmentioning

confidence: 99%

“…This is in contrast to BLCR [21], which requires re-compilation and possibly re-tuning of the kernel module and does not support neither SysV enhancements nor InfiniBand, and CRIU [1], which currently does not even have support for parallel or distributed applications. Furthermore, Rodriguez et al [31] provided two sample scheduling algorithms based on their work. These algorithms do not share the same goal as ours -which is mainly to increase system utilization -as one is used to reduce power consumption by scheduling jobs on as least nodes as possible, and the other is used to schedule jobs with higher priority on better hardware.…”

Section: A Technical Scheduler and Transparent C/rmentioning

confidence: 99%

See 3 more Smart Citations

Optimized Memoryless Fair-Share HPC Resources Scheduling using Transparent Checkpoint-Restart Preemption

Zvi¹,

Oren²

2021

Preprint

View full text Add to dashboard Cite

Common resource management methods in supercomputing systems usually include hard divisions, capping, and quota allotment. Those methods, despite their 'advantages', have some known serious disadvantages including unoptimized utilization of an expensive facility, and occasionally there is still a need to dynamically reschedule and reallocate the resources. Consequently, those methods involve bad supply-and-demand management rather than a free market playground that will eventually increase system utilization and productivity. In this work, we propose the newly Optimized Memoryless Fair-Share HPC Resources Scheduling using Transparent Checkpoint-Restart Preemption, in which the social welfare increases using a free-of-cost interchangeable proprietary possession scheme. Accordingly, we permanently keep the status-quo in regard to the fairness of the resources distribution while maximizing the ability of all users to achieve more CPUs and CPU hours for longer period without any non-straightforward costs, penalties or additional human intervention.

show abstract

Section: B Scheduling Methods In Slurmmentioning

confidence: 98%

Section: The Optimized Memoryless Fair-sharementioning

confidence: 99%

Section: A Technical Scheduler and Transparent C/rmentioning

confidence: 99%

Section: A Technical Scheduler and Transparent C/rmentioning

confidence: 99%

See 2 more Smart Citations

Optimized Memoryless Fair-Share HPC Resources Scheduling using Transparent Checkpoint-Restart Preemption

Zvi¹,

Oren²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Currently, such techniques use local storage independently on each compute node via a single shared link, but can be complemented to leverage local storage of remote nodes. Additionally, checkpoint-restart techniques are also used for accommodating on-demand jobs with batch jobs [4], [5] and workload migration [6], [7].…”

Section: Related Workmentioning

confidence: 99%

Towards Efficient I/O Scheduling for Collaborative Multi-Level Checkpointing

Maurya

Nicolae

Rafique

et al. 2021

2021 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)

View full text Add to dashboard Cite

Efficient checkpointing of distributed data structures periodically at key moments during runtime is a recurring fundamental pattern in a large number of uses cases: fault tolerance based on checkpoint-restart, in-situ or post-analytics, reproducibility, adjoint computations, etc. In this context, multilevel checkpointing is a popular technique: distributed processes can write their shard of the data independently to fast local storage tiers, then flush asynchronously to a shared, slower tier of higher capacity. However, given the limited capacity of fast tiers (e.g. GPU memory) and the increasing checkpoint frequency, the processes often run out of space and need to fall back to blocking writes to the slow tiers. To mitigate this problem, compression is often applied in order to reduce the checkpoint sizes. Unfortunately, this reduction is not uniform: some processes will have spare capacity left on the fast tiers, while others still run out of space. In this paper, we study the problem of how to leverage this imbalance in order to reduce I/O overheads for multi-level checkpointing. To this end, we solve an optimization problem of how much data to send from each process that runs out of space to the processes that have spare capacity in order to minimize the amount of time spent blocking in I/O. We propose two algorithms: one based on a greedy approach and the other based on modified minimum cost flows. We evaluate our proposal using synthetic and real-life application traces. Our evaluation shows that both algorithms achieve significant improvements in checkpoint performance over traditional multilevel checkpointing.

show abstract