2010
DOI: 10.1007/s10766-010-0136-3
|View full text |Cite
|
Sign up to set email alerts
|

ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
58
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 68 publications
(62 citation statements)
references
References 21 publications
(30 reference statements)
4
58
0
Order By: Relevance
“…Then, we control the allocation policy to favor the physical memory attached to the thread NUMA domain. Managing NUMA locality while load-balancing workload is typically achieved either through migrating data to workers [3] or assigning workers based on their proximity to data [2]. For Libtensor, a large fraction of the working set is created after task assignment making traditional techniques not directly applicable.…”
Section: A Explicit Locality Managementmentioning
confidence: 99%
“…Then, we control the allocation policy to favor the physical memory attached to the thread NUMA domain. Managing NUMA locality while load-balancing workload is typically achieved either through migrating data to workers [3] or assigning workers based on their proximity to data [2]. For Libtensor, a large fraction of the working set is created after task assignment making traditional techniques not directly applicable.…”
Section: A Explicit Locality Managementmentioning
confidence: 99%
“…The approach taken in the ForestGOMP [25,26] runtime system is particularly relevant to our work. The ForestGOMP approach focuses on two aspects of the memory association problem.…”
Section: Memory Association and Distributionmentioning
confidence: 99%
“…These models do not define explicit mappings of data to devices, but rather associate data to threads. Some examples of this include Forest-GOMP [25,26] and UPMLib [74]. Even some approaches explored in the Linux kernel for automatic page migration could fall into this category, mapping data to threads by tracking remote-access faults in the system.…”
Section: Memory Association and Distributionmentioning
confidence: 99%
“…ForestGOMP [9] made some modification to STREAM benchmark. STREAM measures sustainable memory bandwidth and the corresponding computation rate for simple vectors.…”
Section: Related Workmentioning
confidence: 99%
“…Linux traditionally had ways to bind threads to specific CPUs/Cores and NUMA API extends that to allow programs to specify on which node memory should be allocated. Some more complicated APIs are based on these basic policies, such as MAi [7] and MaMI [9].It is not an easy task to apply these API because it is much difficult to find the communication pattern in shared memory platform than message passing platform, because it is implicit and occurs through the memory accesses. Recently, some tools are available to guide a program developer on where to judiciously apply these API within a large parallel code [10][11] [12].…”
Section: Introductionmentioning
confidence: 99%