2018 IEEE/ACM Parallel Applications Workshop, Alternatives to MPI (PAW-ATM) 2018
DOI: 10.1109/paw-atm.2018.00006
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Algorithms for Collective Operations with Notified Communication in Shared Windows

Abstract: Collective operations are commonly used in various parts of scientific applications. Especially in strong scaling scenarios collective operations can negatively impact the overall applications performance: while the load per rank here decreases with increasing core counts, time spent in e.g. barrier operations will increase logarithmically with the core count. In this article, we develop novel algorithmic solutions for collective operationssuch as Allreduce and Allgather(V)-by leveraging notified communication… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 14 publications
0
2
0
Order By: Relevance
“…This implementation delivered up to the 3x performance boost compared to the best Intel MPI implementations v5.1.2 on the Salomon IT4I cluster (Infiniband FDR). We extended this idea to Allgather(V) and Allreduce with an adaptation of the dissemination algorithm [17], achieving up to 2x-4x performance improvements compared to the best performing MPI implementations on the Salomon IT4I cluster and the Beskow Cray XC40 cluster at PDC, KTH.…”
Section: Resultsmentioning
confidence: 99%
“…This implementation delivered up to the 3x performance boost compared to the best Intel MPI implementations v5.1.2 on the Salomon IT4I cluster (Infiniband FDR). We extended this idea to Allgather(V) and Allreduce with an adaptation of the dissemination algorithm [17], achieving up to 2x-4x performance improvements compared to the best performing MPI implementations on the Salomon IT4I cluster and the Beskow Cray XC40 cluster at PDC, KTH.…”
Section: Resultsmentioning
confidence: 99%
“…This implementation delivered up to the 3x performance boost compared to the best Intel MPI implementations v5.1.2 on the Salomon IT4I cluster (Infiniband FDR). We extended this idea to Allgather(V) and Allreduce with an adaptation of the dissemination algorithm [16], achieving up to 2x-4x performance improvements compared to the best performing MPI implementations on the Salomon IT4I cluster and the Beskow Cray XC40 cluster at PDC, KTH.…”
Section: D) Consistent Allreducementioning
confidence: 99%