2014
DOI: 10.1109/tpds.2013.104
|View full text |Cite
|
Sign up to set email alerts
|

Process Placement in Multicore Clusters:Algorithmic Issues and Practical Techniques

Abstract: Current generations of NUMA node clusters feature multicore or manycore processors. Programming such architectures eciently is a challenge because numerous hardware characteristics have to be taken into account, especially the memory hierarchy. One appealing idea to improve the performance of parallel applications is to decrease their communication costs by matching the communication pattern to the underlying hardware architecture. In this report, we detail the algorithm and techniques proposed to achieve such… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
116
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
5
3
2

Relationship

4
6

Authors

Journals

citations
Cited by 91 publications
(116 citation statements)
references
References 31 publications
0
116
0
Order By: Relevance
“…If k = 2 and the two graphs to be partitioned are the application graph and G p , the method is called dual recursive bipartitioning. Recently, schemes that model the processor graph as a tree have emerged [CLA12] in this algorithmic context and in similar ones [JMT13].…”
Section: Mapping Techniquesmentioning
confidence: 99%
“…If k = 2 and the two graphs to be partitioned are the application graph and G p , the method is called dual recursive bipartitioning. Recently, schemes that model the processor graph as a tree have emerged [CLA12] in this algorithmic context and in similar ones [JMT13].…”
Section: Mapping Techniquesmentioning
confidence: 99%
“…This information is used for placing tasks (usually threads or processes) in an affinity-aware way: matching inter-task affinities (tasks that synchronize/communicate a lot benefit from shorter distance between them) [2], [3]; placing tasks and their target data buffers together (on NUMA nodes close to cores and/or I/O devices that access them) [4].…”
Section: A Why Locality Is Importantmentioning
confidence: 99%
“…To compute the allocation we use Algorithm 1 that is based on the TreeMatch algorithm [5]. We have adapted it in two ways to our needs.…”
Section: Topology-aware Add-on Of Orwl Tasksmentioning
confidence: 99%