2014
DOI: 10.1145/2641764
|View full text |Cite
|
Sign up to set email alerts
|

Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages

Abstract: We present a joint scheduling and memory allocation algorithm for efficient execution of task-parallel programs on non-uniform memory architecture (NUMA) systems. Task and data placement decisions are based on a static description of the memory hierarchy and on runtime information about intertask communication. Existing locality-aware scheduling strategies for fine-grained tasks have strong limitations: they are specific to some class of machines or applications, they do not handle task dependences, they requi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
32
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 36 publications
(32 citation statements)
references
References 33 publications
0
32
0
Order By: Relevance
“…In earlier work [14], we showed that some of these issues can be mitigated by using work-pushing. Similar to the abstract model discussed above, the approach assumes that tasks communicate through task-private buffers.…”
Section: Weaknesses Of Task Parallelism On Numa Systemsmentioning
confidence: 94%
See 3 more Smart Citations
“…In earlier work [14], we showed that some of these issues can be mitigated by using work-pushing. Similar to the abstract model discussed above, the approach assumes that tasks communicate through task-private buffers.…”
Section: Weaknesses Of Task Parallelism On Numa Systemsmentioning
confidence: 94%
“…If work-pushing is enabled, workers can also receive tasks in a dedicated multi-producer single-consumer queue [14]. Our experiments use one worker thread per core.…”
Section: Software Environmentmentioning
confidence: 99%
See 2 more Smart Citations
“…Since the inputs of a task are outputs for another task, the location of input data is determined when the producer task executes. This advocates for an enhanced workpushing technique, building on the algorithm proposed by Drebes et al (Drebes et al 2014), and revising it to together with deferred allocation: a task is placed according to the location of its input data before allocating memory for its outputs. This combination of enhanced work-pushing and deferred allocation is fully automatic, application-independent, portable across NUMA machines and transparently adapts to dynamic changes at run time.…”
Section: Numa-aware Optimizationsmentioning
confidence: 99%