2018
DOI: 10.1007/978-3-030-10549-5_10
|View full text |Cite
|
Sign up to set email alerts
|

Progress Thread Placement for Overlapping MPI Non-blocking Collectives Using Simultaneous Multi-threading

Abstract: Non-blocking collectives have been proposed so as to allow communications to be overlapped with computation in order to amortize the cost of MPI collective operations. To obtain a good overlap ratio, communications and computation have to run in parallel. To achieve this, different hardware and software techniques exists. Dedicated some cores to run progress threads is one of them. However, some CPUs provide Simultaneous Multi-Threading, which is the ability for a core to have multiple hardware threads running… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

1
1
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 14 publications
1
1
0
Order By: Relevance
“…We found that using a dedicated core is essential to guarantee sufficient progression of MPI messages and achieving our objective of fine-granular reactivity. Similar findings have been reported in [4,6]. In contrast to predictive load balancing, there is no mutual a-priori agreement on the task migration pattern.…”
Section: Communication Infrastructuresupporting
confidence: 75%
“…We found that using a dedicated core is essential to guarantee sufficient progression of MPI messages and achieving our objective of fine-granular reactivity. Similar findings have been reported in [4,6]. In contrast to predictive load balancing, there is no mutual a-priori agreement on the task migration pattern.…”
Section: Communication Infrastructuresupporting
confidence: 75%
“…This option is typically implemented with threads, which handle the status of the non-blocking operations and perform the corresponding progression. The drawback for this strategy is related with significant overhead, produced by the progression threads [26,27,28,29]. The manual progression is generally independent on the hardware and MPI library implementation, but needs some user efforts to add MPI Test or MPI Probe calls to progress the communications.…”
Section: Global Communicationsmentioning
confidence: 99%