2009 21st International Symposium on Computer Architecture and High Performance Computing 2009
DOI: 10.1109/sbac-pad.2009.16
|View full text |Cite
|
Sign up to set email alerts
|

Memory Affinity for Hierarchical Shared Memory Multiprocessors

Abstract: Abstract-Currently, parallel platforms based on large scale hierarchical shared memory multiprocessors with NonUniform Memory Access (NUMA) are becoming a trend in scientific High Performance Computing (HPC). Due to their memory access constraints, these platforms require a very careful data distribution. Many solutions were proposed to resolve this issue. However, most of these solutions did not include optimizations for numerical scientific data (array data structures) and portability issues. Besides, these … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
38
0

Year Published

2012
2012
2018
2018

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 45 publications
(39 citation statements)
references
References 8 publications
1
38
0
Order By: Relevance
“…This organization, which can be easily implemented with specialized memory allocator routines (libnuma library for instance) still contains remote memory accesses which might become really hindering with larger images. To solve this, we chose to implement a block-cyclic allocation [43] using a memory binding routine, which can perform a remapping of a given memory chunk onto a specific node. The core of the corresponding C is provided in figure 15.…”
Section: Numa-aware Adaptationmentioning
confidence: 99%
“…This organization, which can be easily implemented with specialized memory allocator routines (libnuma library for instance) still contains remote memory accesses which might become really hindering with larger images. To solve this, we chose to implement a block-cyclic allocation [43] using a memory binding routine, which can perform a remapping of a given memory chunk onto a specific node. The core of the corresponding C is provided in figure 15.…”
Section: Numa-aware Adaptationmentioning
confidence: 99%
“…Memphis [11] evaluated its effectiveness by applying the NPB (NAS Parallel Benchmarks), HYCOM (a production ocean modeling application), XGC1 (a production Fortran90 particle-in-cell code that models several aspects of plasmas in a tokamak thermonuclear fusion reactor) and CAM (the Community Atmosphere Model). MAi [7] used two kernels (FFT and CG) from NPB and ICTM [15]. SPLASH2, PARSEC and Advention (a part of the Brazilian Regional Atmosphere Modeling System) were used in [13].…”
Section: Related Workmentioning
confidence: 99%
“…It means that multithreaded codes in NUMA platform should sustain sufficient locality of memory access and minimize access to remote data to obtain a high performance. The importance of the data locality is well documented [1][2][3] [4] and there are some OS-provided NUMA APIs to control it [5][6] [7] [8]. Linux traditionally had ways to bind threads to specific CPUs/Cores and NUMA API extends that to allow programs to specify on which node memory should be allocated.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Threads that access a large amount of shared data should be mapped to cores that are close to each other in the memory hierarchy, while data should be mapped to the same NUMA node that the threads that access it are executing on [22]. In this way, the locality of the memory accesses is improved, which leads to an increase of performance and energy efficiency.…”
Section: Introductionmentioning
confidence: 99%