SC22: International Conference for High Performance Computing, Networking, Storage and Analysis 2022
DOI: 10.1109/sc41404.2022.00029
|View full text |Cite
|
Sign up to set email alerts
|

Towards Scalable Resource Management for Supercomputers

Abstract: Today's supercomputers offer massive computation resources to execute a large number of user jobs. Effectively managing such large-scale hardware parallelism and workloads is essential for supercomputers. However, existing HPC resource management (RM) systems fail to capitalize on the hardware parallelism by following a centralized design used decades ago. They give poor scalability and inefficient performance on today's supercomputers, which will worsen in exascale computing. We present ESLURM, a better RM fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 41 publications
0
1
0
Order By: Relevance
“…Slurm (Simple Linux Utility for Resource Management) is an open‐source resource management and job scheduling system known for its fault tolerance and high scalability, making it a popular choice for both large and small Linux clusters 27 . Many of the world's top‐ranked supercomputers employ Slurm to ensure effective management of resources and jobs, 28 preventing interference and enhancing execution efficiency. Within this intricate computing environment, each job's execution details are meticulously documented in every line of the job logs.…”
Section: Application Sequence and Framework Designmentioning
confidence: 99%
“…Slurm (Simple Linux Utility for Resource Management) is an open‐source resource management and job scheduling system known for its fault tolerance and high scalability, making it a popular choice for both large and small Linux clusters 27 . Many of the world's top‐ranked supercomputers employ Slurm to ensure effective management of resources and jobs, 28 preventing interference and enhancing execution efficiency. Within this intricate computing environment, each job's execution details are meticulously documented in every line of the job logs.…”
Section: Application Sequence and Framework Designmentioning
confidence: 99%