2017
DOI: 10.1145/3040222
|View full text |Cite
|
Sign up to set email alerts
|

A Library for Portable and Composable Data Locality Optimizations for NUMA Systems

Abstract: Many recent multiprocessor systems are realized with a nonuniform memory architecture (NUMA) and accesses to remote memory locations take more time than local memory accesses. Optimizing NUMA memory system performance is difficult and costly for three principal reasons: (1) today's programming languages/libraries have no explicit support for NUMA systems, (2) NUMA optimizations are not portable, and (3) optimizations are not composable (i.e., they can become ineffective or worsen performance in environments th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 44 publications
0
11
0
Order By: Relevance
“…Most programming languages do not implement thread and data mapping algorithms that can be called from the source code [23]. Thus, the developer is required to know the implementation of the language or of the compiler to be able to map threads efficiently using specific routines [23]. The libraries that provide such routines are TBB-NUMA [23], QThreads [24] and PThreads.…”
Section: Software Libraries For Thread Mappingmentioning
confidence: 99%
See 1 more Smart Citation
“…Most programming languages do not implement thread and data mapping algorithms that can be called from the source code [23]. Thus, the developer is required to know the implementation of the language or of the compiler to be able to map threads efficiently using specific routines [23]. The libraries that provide such routines are TBB-NUMA [23], QThreads [24] and PThreads.…”
Section: Software Libraries For Thread Mappingmentioning
confidence: 99%
“…Thus, the developer is required to know the implementation of the language or of the compiler to be able to map threads efficiently using specific routines [23]. The libraries that provide such routines are TBB-NUMA [23], QThreads [24] and PThreads. TBB-NUMA [23] is designed as a portable and composable library for parallel computation.…”
Section: Software Libraries For Thread Mappingmentioning
confidence: 99%
“…As corroborated by a large amount of work in operating systems [12,14,18,19,35,61,70,74], databases [36,44,62,63,75], programming languages [37,58], parallel runtimes [7,9,41,52], key-value stores [15,51], and synchronization [22-24, 32, 45], system developers need to optimize software for the target platform to achieve good performance. We discuss below selected examples of multi-core optimizations.…”
Section: Related Workmentioning
confidence: 99%
“…Starting with userland methods, it is possible to statically place arrays and computations to bring NUMA awareness to OpenMP programs [27,2,34,32] or applications using TBB [26]. Such approaches are well suited to regular data structures and involve target-specific optimizations by the programmer.…”
Section: Related Workmentioning
confidence: 99%
“…On the operating system side, optimizations are compelled to place tasks and data conservatively [13,24], unless provided with detailed affinity information by the application [5,6], high-level libraries [26] or domain specific languages [20]. Nevertheless, as task-parallel run-times operate in user-space, a separate kernel component would add additional complexity to the solution; this advocates for a user-space approach.…”
Section: Introductionmentioning
confidence: 99%