Structuring the execution of OpenMP applications for multicore architectures

Broquedis, François; Aumage, Olivier; Goglin, Brice; Thibault, Samuel; Wacrenier, Pierre-Andr; Namyst, Raymond

doi:10.1109/ipdps.2010.5470442

Cited by 48 publications

(32 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…These mechanisms generate mapping information based on a very small number of samples compared to SAMMU, as all memory accesses are handled by the MMU. Some techniques such as Forest-GOMP [4] require annotations in the source code and depend on specific parallelization libraries. Similarly, Ogasawara [20] proposes a data mapping method that is limited to object oriented languages.…”

Section: Related Workmentioning

confidence: 99%

A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures

Cruz

Diener

Pilla

et al. 2016

Euro-Par 2016: Parallel Processing

View full text Add to dashboard Cite

Abstract. In modern shared-memory architectures, it is important to map threads and data in a way that increases the locality of their memory accesses, thereby improving performance and energy efficiency. Threads that access shared data should be mapped close to each other in the memory hierarchy, while the data they access should be mapped to their NUMA node, which is called sharing-aware mapping. In this paper, we propose SAMMU, which adds sharing-awareness to the memory management unit in current architectures. SAMMU analyzes the memory access behavior in hardware and provides information to the operating system so it can perform an online mapping of threads and data. In the evaluation with a wide range of parallel applications, performance was improved by up to 35.7% (13.1% on average).

show abstract

Section: Related Workmentioning

confidence: 99%

A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures

Cruz

Diener

Pilla

et al. 2016

Euro-Par 2016: Parallel Processing

View full text Add to dashboard Cite

show abstract

“…ForestGOMP [5,6] is an OpenMP run-time with a resourceaware scheduler and a NUMA-aware allocator. It introduces three concepts: grouping of OpenMP threads into bubbles, scheduling of threads and bubbles using a hierarchy of runqueues, and migrating data dynamically upon load balancing.…”

Section: Related Workmentioning

confidence: 99%

“…On the operating system side, optimizations are compelled to place tasks and data conservatively [13,24], unless provided with detailed affinity information by the application [5,6], high-level libraries [26] or domain specific languages [20]. Nevertheless, as task-parallel run-times operate in user-space, a separate kernel component would add additional complexity to the solution; this advocates for a user-space approach.…”

Section: Introductionmentioning

confidence: 99%

Scalable Task Parallelism for NUMA

Drebes

Pop

Heydemann

et al. 2016

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

View full text Add to dashboard Cite

Dynamic task-parallel programming models are popular on shared-memory systems, promising enhanced scalability, load balancing and locality. These promises, however, are undermined by non-uniform memory access (NUMA). We show that using NUMA-aware task and data placement, it is possible to preserve the uniform hardware abstraction of contemporary task-parallel programming models for both computing and memory resources with high data locality. Our data placement scheme guarantees that all accesses to task output data target the local memory of the accessing core. The complementary task placement heuristic improves the locality of accesses to task input data on a best effort basis. Our algorithms take advantage of data-flow style task parallelism, where the privatization of task data enhances scalability by eliminating false dependences and enabling fine-grained dynamic control over data placement. The algorithms are fully automatic, application-independent, performance-portable across NUMA machines, and adapt to dynamic changes. Placement decisions use information about inter-task data dependences readily available in the run-time system, and placement information from the operating system. On a 192-core system with 24 NUMA nodes, our optimizations achieve above 94% locality (fraction of local memory accesses), up to 5× better performance than NUMAaware hierarchical work-stealing, and even 5.6× compared to static interleaved allocation. Finally, we show that stateof-the-art dynamic page migration by the operating system cannot catch up with frequent affinity changes between cores and data and thus fails to accelerate task-parallel applications.

show abstract

“…A library called ForestGOMP is introduced in [Broquedis et al 2010a]. This library integrates into the OpenMP runtime environment and gathers information about the different parallel sections of the applications.…”

Section: Joint Thread and Data Mappingmentioning

confidence: 99%

Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures

Cruz

Diener

Pilla

et al. 2016

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

The performance and energy efficiency of modern architectures depend on memory locality, which can be improved by thread and data mappings considering the memory access behavior of parallel applications. In this paper, we propose IPM, a mechanism that analyzes the memory access behavior using information about the time the entry of each page resides in the Translation Lookaside Buffer (TLB). It provides very accurate information with a very low overhead. We present experimental results with simulation and real machines, with average performance improvements of 13.7% and energy savings of 4.4%, which come from reductions in cache misses and interconnection traffic.

show abstract

Structuring the execution of OpenMP applications for multicore architectures

Cited by 48 publications

References 14 publications

A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures

A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures

Scalable Task Parallelism for NUMA

Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures

Contact Info

Product

Resources

About