Improving the accessibility of NUMA‐aware C++ application development based on the PGASUS framework

Plauth, Max; Eberhardt, Felix; Grapentin, Andreas; Polze, Andreas

doi:10.1002/cpe.6887

Cited by 3 publications

(1 citation statement)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unfortunately, this comes at the cost of increasing the design space and introducing a considerable burden on the programmers' shoulders, who now have to avoid remote memory accesses as well as to control thread-to-core pinning [21,33,38]. To partially alleviate this situation, NUMA-aware optimizations have been introduced in most levels of the software stack, including applications [13,43,49], libraries and middleware [32,36], hardware-software co-design of runtime and operating systems [9,24,39], hypervisors [46], and container orchestrators [16].…”

Section: Configurable Numa Memoriesmentioning

confidence: 99%

Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

Catalán

Igual

Herrero

et al. 2023

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Section: Configurable Numa Memoriesmentioning

confidence: 99%

Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

Catalán

Igual

Herrero

et al. 2023

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Mitigating the NUMA effect on task-based runtime systems

et al. 2023

View full text Add to dashboard Cite

Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processors usually expose a single shared address space. However, due to hardware restrictions, they adopt a NUMA approach, where each processor accesses local memory faster than remote memories. Reducing data motion is crucial to improve the overall performance. Thus, computations must run as close as possible to where the data resides. We propose a new approach that mitigates the NUMA effect on NUMA systems. Our solution is based on the OmpSs-2 programming model, a task-based parallel programming model, similar to OpenMP. We first provide a simple API to allocate memory in NUMA systems using different policies. Then, combining user-given information that specifies dependences between tasks, and information collected in a global directory when allocating data, we extend our runtime library to perform NUMA-aware work scheduling. Our heuristic considers data location, distance between NUMA nodes, and the load of each NUMA node to seamlessly minimize data motion costs and load imbalance. Our evaluation shows that our NUMA support can significantly mitigate the NUMA effect by reducing the amount of remote accesses, and so improving performance on most benchmarks, reaching up to 2x speedup in a 2-NUMA machine, and up to 7.1x in a 8-NUMA machine.

show abstract