2015
DOI: 10.1145/2686882
|View full text |Cite
|
Sign up to set email alerts
|

Collective Algorithms for Multiported Torus Networks

Abstract: Modern supercomputers with torus networks allow each node to simultaneously pass messages on all of its links. However, most collective algorithms are designed to only use one link at a time. In this work, we present novel multiported algorithms for the scatter, gather, all-gather, and reduce-scatter operations. Our algorithms can be combined to create multiported reduce, all-reduce, and broadcast algorithms. Several of these algorithms involve a new technique where we relax the MPI message-ordering constraint… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(2 citation statements)
references
References 21 publications
0
2
0
Order By: Relevance
“…All-Reduce is a family of collective operations that allow nodes to efficiently aggregate (e.g. average) their local vectors and distribute the result across all devices [51,52,53]. Unlike parameter servers, All-Reduce assigns equal roles to all devices, making it easier to scale to a large number of homogeneous workers.…”
Section: Distributed Trainingmentioning
confidence: 99%
“…All-Reduce is a family of collective operations that allow nodes to efficiently aggregate (e.g. average) their local vectors and distribute the result across all devices [51,52,53]. Unlike parameter servers, All-Reduce assigns equal roles to all devices, making it easier to scale to a large number of homogeneous workers.…”
Section: Distributed Trainingmentioning
confidence: 99%
“…If N =M d and there are no node/network failures, Algorithm 1 is equivalent to Torus All-Reduce (Sack & Gropp, 2015), achieving the exact average after d rounds of communication (see Appendix C.1). However, our typical use case is far from this perfect scenario; for example, some groups can have less than M members.…”
Section: Moshpit Averagingmentioning
confidence: 99%