2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS) 2021
DOI: 10.1109/icpads53394.2021.00041
|View full text |Cite
|
Sign up to set email alerts
|

Improving HPC System Throughput and Response Time using Memory Disaggregation

Abstract: HPC clusters are cost-effective, well understood, and scalable, but the rigid boundaries between compute nodes may lead to poor utilization of compute and memory resources. HPC jobs may vary, by orders of magnitude, in memory consumption per core. Thus, even when the system is provisioned to accommodate normal and large capacity nodes, a mismatch between the system and the memory demands of the scheduled jobs can lead to inefficient usage of both memory and compute resources.Disaggregated memory has recently b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(19 citation statements)
references
References 28 publications
0
19
0
Order By: Relevance
“…We use the BSC simulator extensions for disaggregated memory [9], [10], which adds support for disaggregated memory, modelling the slowdown due to remote memory accesses, and extends Slurm's allocation policy to exploit disaggregated memory. The performance model is a slowdownbased method [22], extended to support MPI processes [9].…”
Section: Slurm Simulator Supporting Disaggregated Memorymentioning
confidence: 99%
See 2 more Smart Citations
“…We use the BSC simulator extensions for disaggregated memory [9], [10], which adds support for disaggregated memory, modelling the slowdown due to remote memory accesses, and extends Slurm's allocation policy to exploit disaggregated memory. The performance model is a slowdownbased method [22], extended to support MPI processes [9].…”
Section: Slurm Simulator Supporting Disaggregated Memorymentioning
confidence: 99%
“…We use the BSC simulator extensions for disaggregated memory [9], [10], which adds support for disaggregated memory, modelling the slowdown due to remote memory accesses, and extends Slurm's allocation policy to exploit disaggregated memory. The performance model is a slowdownbased method [22], extended to support MPI processes [9]. The allocation policy first selects nodes that have enough local memory to satisfy the job's requirement of memory per node to avoid unnecessary remote memory access.…”
Section: Slurm Simulator Supporting Disaggregated Memorymentioning
confidence: 99%
See 1 more Smart Citation
“…It also details the hardware used to run our experiments, the set of single and multi node applications we profiled to create the contention model, and the concept of disaggregated memory employed in this work. Much of the content in this Chapter is presented in our papers [39,40].…”
Section: Outline Of Thesismentioning
confidence: 99%
“…We also detail the methodology applied to generate the workload employed in all our experiments. The methodology detailed in this Chapter is employed in our papers [40][41][42].…”
Section: Outline Of Thesismentioning
confidence: 99%