Proceedings of the 47th International Conference on Parallel Processing 2018
DOI: 10.1145/3225058.3225094
|View full text |Cite
|
Sign up to set email alerts
|

NumaMMA

Abstract: Non Uniform Memory Access (NUMA) architectures are nowadays common for running High-Performance Computing (HPC) applications. In such architectures, several distinct physical memories are assembled to create a single shared memory. Nevertheless, because there are several physical memories, access times to these memories are not uniform depending on the location of the core performing the memory request and on the location of the target memory. Hence, threads and data placement are crucial to efficiently exploi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 19 publications
(6 citation statements)
references
References 29 publications
0
6
0
Order By: Relevance
“…The mappings that require detailed profiler/programmer support include: locality (each page is allocated in the node of the cores that will access the page the most), and balance (pages are spread across the nodes in such a way that the total amount of memory accesses to each node is approximately the same). These mappings require profiling the application's access pattern and implicitly assume that the patterns are reasonably stable across different runs and inputs, which has been shown to be a fair assumption for these benchmarks [34,40].…”
Section: Pagementioning
confidence: 99%
See 4 more Smart Citations
“…The mappings that require detailed profiler/programmer support include: locality (each page is allocated in the node of the cores that will access the page the most), and balance (pages are spread across the nodes in such a way that the total amount of memory accesses to each node is approximately the same). These mappings require profiling the application's access pattern and implicitly assume that the patterns are reasonably stable across different runs and inputs, which has been shown to be a fair assumption for these benchmarks [34,40].…”
Section: Pagementioning
confidence: 99%
“…Codelets have been shown to be quite accurate for both microarchitectural evaluation [12] and NUMA configuration studies [35]. This is because parallel regions typically exhibit similar behavior [40]. For our fork-join applications, we extract codelets for instances of each important OpenMP parallel region.…”
Section: Faster Evaluation: Sampling With Codeletsmentioning
confidence: 99%
See 3 more Smart Citations