2020
DOI: 10.48550/arxiv.2009.08289
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Extending SLURM for Dynamic Resource-Aware Adaptive Batch Scheduling

Abstract: With the growing constraints on power budget and increasing hardware failure rates, the operation of future exascale systems faces several challenges. Towards this, resource awareness and adaptivity by enabling malleable jobs has been actively researched in the HPC community. Malleable jobs can change their computing resources at runtime and can significantly improve HPC system performance. However, due to the rigid nature of popular parallel programming paradigms such as MPI and lack of support for dynamic re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 26 publications
(42 reference statements)
0
1
0
Order By: Relevance
“…Second, the job manager provides a framework for starting, executing, and monitoring jobs on the set of allocated compute nodes. Finally, the scheduler arbitrates contention regarding the computing resources by managing a queue of pending works [11].…”
Section: Operating Hpc/ai Converged Clustermentioning
confidence: 99%
“…Second, the job manager provides a framework for starting, executing, and monitoring jobs on the set of allocated compute nodes. Finally, the scheduler arbitrates contention regarding the computing resources by managing a queue of pending works [11].…”
Section: Operating Hpc/ai Converged Clustermentioning
confidence: 99%