Proceedings of the 29th European MPI Users' Group Meeting 2022
DOI: 10.1145/3555819.3555856
|View full text |Cite
|
Sign up to set email alerts
|

Towards Dynamic Resource Management with MPI Sessions and PMIx

Abstract: Job management software on peta-and exascale supercomputers continues to provide static resource allocations, from a program's start until its end. Dynamic resource allocation and management is a research direction that has the potential to improve the efficiency of HPC systems and applications by dynamically adapting the resources of an application during its runtime. Resources can be adapted based on past, current or even future system conditions and matching optimization targets. However, the implementation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 17 publications
(7 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…Monitoring solutions, for example, could be used here to find the right timing for such operations. Applications supporting advanced forms of checkpoint/restart or dynamic load balancing [24], [25] are good examples, being massively parallel and of an iterative nature with prescribed points where resource changes can be accommodated. Also, certain kinds of applications, e.g., those based on task farming or parallel-in-time codes based on the Parareal algorithm [26], can benefit from malleability without significant code changes.…”
Section: A Applicationsmentioning
confidence: 99%
See 4 more Smart Citations
“…Monitoring solutions, for example, could be used here to find the right timing for such operations. Applications supporting advanced forms of checkpoint/restart or dynamic load balancing [24], [25] are good examples, being massively parallel and of an iterative nature with prescribed points where resource changes can be accommodated. Also, certain kinds of applications, e.g., those based on task farming or parallel-in-time codes based on the Parareal algorithm [26], can benefit from malleability without significant code changes.…”
Section: A Applicationsmentioning
confidence: 99%
“…II-C), for example, the user-level code must outline the reconfiguration phase and carefully define datamanagement procedures enabling the remapping of the various operands. Nonetheless, more generic support could, e.g., be devised either inside PETSc [25] that controls the data or task distribution of applications or directly through MPI [24], with a set of extensions for dynamic resources and AMR.…”
Section: A Applicationsmentioning
confidence: 99%
See 3 more Smart Citations