2024
DOI: 10.1109/tpds.2023.3331372
|View full text |Cite
|
Sign up to set email alerts
|

US-Byte: An Efficient Communication Framework for Scheduling Unequal-Sized Tensor Blocks in Distributed Deep Learning

Yunqi Gao,
Bing Hu,
Mahdi Boloursaz Mashhadi
et al.

Abstract: The communication bottleneck severely constrains the scalability of distributed deep learning, and efficient communication scheduling accelerates distributed DNN training by overlapping computation and communication tasks. However, existing approaches based on tensor partitioning are not efficient and suffer from two challenges: (1) the fixed number of tensor blocks transferred in parallel can not necessarily minimize the communication overheads; (2) although the scheduling order that preferentially transmits … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 40 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?