2018 IEEE International Conference on Cluster Computing (CLUSTER) 2018
DOI: 10.1109/cluster.2018.00014
|View full text |Cite
|
Sign up to set email alerts
|

SALaR: Scalable and Adaptive Designs for Large Message Reduction Collectives

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 24 publications
(5 citation statements)
references
References 18 publications
0
5
0
Order By: Relevance
“…For the synchronous data‐parallel approaches, major frameworks and GPU‐based libraries optimize the communication scheme for such large message size, ie, a bandwidth‐optimal algorithm for Allreduce operation. The work by Bayatpour et al 42 summarized the state‐of‐the‐art designs and features of MPI AllReduce in the literature for large messages that were covered in our evaluation. For example, Baidu‐AllReduce and Segmented Ring implemented the logical ring‐based algorithm.…”
Section: Discussion and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…For the synchronous data‐parallel approaches, major frameworks and GPU‐based libraries optimize the communication scheme for such large message size, ie, a bandwidth‐optimal algorithm for Allreduce operation. The work by Bayatpour et al 42 summarized the state‐of‐the‐art designs and features of MPI AllReduce in the literature for large messages that were covered in our evaluation. For example, Baidu‐AllReduce and Segmented Ring implemented the logical ring‐based algorithm.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…It is reported that those algorithms can be applied to GPU systems, which is the main target in this work. By contrast, XPMEM‐based Reduction and SALaR algorithms 42 are proposed to support systems with CPU only, ie, based on the shared‐memory architecture. We, hence, do not consider those algorithms for comparison with our proposal.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…Another collective that employs a ring algorithm for large messages is Allreduce. This collective has been heavily studied and continuously gets improved by the academic community [12][13][14] ; it is frequently used in both traditional HPC and DL operations.…”
Section: Motivationmentioning
confidence: 99%
“…The approach works well for messages up to 256 Kbytes. The mechanism is improved later for larger message sizes by introducing pipelining [4].…”
Section: Related Workmentioning
confidence: 99%