2021
DOI: 10.1007/s10586-021-03370-9
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating distributed deep neural network training with pipelined MPI allreduce

Abstract: TensorFlow (TF) is usually combined with the Horovod (HVD) workload distribution package to obtain a parallel tool to train deep neural network on clusters of computers. HVD in turn utilizes a blocking Allreduce primitive to share information among processes, combined with a communication thread to overlap communication with computation. In this work, we perform a thorough experimental analysis to expose (1) the importance of selecting the best algorithm in MPI libraries to realize the Allreduce operation; and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 9 publications
(1 citation statement)
references
References 26 publications
0
1
0
Order By: Relevance
“…Another collective that employs a ring algorithm for large messages is Allreduce. This collective has been heavily studied and continuously gets improved by the academic community [12][13][14] ; it is frequently used in both traditional HPC and DL operations.…”
Section: Motivationmentioning
confidence: 99%
“…Another collective that employs a ring algorithm for large messages is Allreduce. This collective has been heavily studied and continuously gets improved by the academic community [12][13][14] ; it is frequently used in both traditional HPC and DL operations.…”
Section: Motivationmentioning
confidence: 99%