2015
DOI: 10.1155/2015/981759
|View full text |Cite
|
Sign up to set email alerts
|

Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Abstract: Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache acce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 17 publications
(18 citation statements)
references
References 31 publications
0
18
0
Order By: Relevance
“…Scheduling to improve data locality and minimizing NUMA effects in shared memory task parallel execution is an active research area [6,21,22,23,24,25,26,27] and can also be coupled to energy considerations [28,29]. : Left: A small task graph where all accesses (the type is indicated for each task) are assumed to be to the same shared data.…”
Section: Tracking Dependencies Through Data Versioningmentioning
confidence: 99%
“…Scheduling to improve data locality and minimizing NUMA effects in shared memory task parallel execution is an active research area [6,21,22,23,24,25,26,27] and can also be coupled to energy considerations [28,29]. : Left: A small task graph where all accesses (the type is indicated for each task) are assumed to be to the same shared data.…”
Section: Tracking Dependencies Through Data Versioningmentioning
confidence: 99%
“…Other Approaches. Muddukrishna et al (2016) use a locality aware runtime and user annotations to distribute data to different NUMA nodes. They introduce work-stealing and work-dealing algorithms that take queue sizes and node distance into account before stealing (dealing) tasks from (to) other nodes.…”
Section: Related Workmentioning
confidence: 99%
“…A common approach is to build data locality aware compilers [11], e.g. locality aware scheduling of OpenMP tasks on multicore CPUs [9] and mapping nested access patterns on GPUs [8]. Minimising cache misses involves profiling cache traces, moreover trading function inlining with executable size, and managing memory pressure.…”
Section: Data Localitymentioning
confidence: 99%