Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques 2020
DOI: 10.1145/3410463.3414643
|View full text |Cite
|
Sign up to set email alerts
|

RackMem

Abstract: High-performance computing (HPC) clusters suffer from an overall low memory utilization that is caused by the node-centric memory allocation combined with the variable memory requirements of HPC workloads. The recent provisioning of nodes with terabytes of memory to accommodate workloads with extreme peak memory requirements further exacerbates the problem. Memory disaggregation is viewed as a promising remedy to increase overall resource utilization and enable cost-effective up-scaling and efficient operation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
0
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(11 citation statements)
references
References 25 publications
0
0
0
Order By: Relevance
“…The default node allocation of Slurm is the exclusive also known as server-based or node-based allocation mode. In this allocation mode a job is placed only if a node can satisfy the amount of requested core and memory resources [55], so even if not all resources within the node are utilized by a specific job, no other job is allowed to share the resource. As a matter of fact, many HPC systems disallow the colocation of different workloads on the same compute node to minimize the negative impact caused by inter-workload interference [56,10].…”
Section: Slurm Resource Managermentioning
confidence: 99%
See 4 more Smart Citations
“…The default node allocation of Slurm is the exclusive also known as server-based or node-based allocation mode. In this allocation mode a job is placed only if a node can satisfy the amount of requested core and memory resources [55], so even if not all resources within the node are utilized by a specific job, no other job is allowed to share the resource. As a matter of fact, many HPC systems disallow the colocation of different workloads on the same compute node to minimize the negative impact caused by inter-workload interference [56,10].…”
Section: Slurm Resource Managermentioning
confidence: 99%
“…Modern HPC systems are therefore typically built from thousands of ccNUMA nodes communicating via a fast interconnect such as InfiniBand or OmniPath. However, they often suffer from memory underutilization [55].…”
Section: Disaggregated Memorymentioning
confidence: 99%
See 3 more Smart Citations