2022
DOI: 10.1109/tpds.2022.3201531
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing DNN Compilation for Distributed Training With Joint OP and Tensor Fusion

Abstract: This paper proposes DisCo, an automatic deep learning compilation module for data-parallel distributed training. Unlike most deep learning compilers that focus on training or inference on a single device, DisCo optimizes a DNN model for distributed training over multiple GPU machines. Existing single-device compilation strategies do not work well in distributed training, due mainly to communication inefficiency that they incur. DisCo generates optimized, joint computation operator and communication tensor fusi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
references
References 39 publications
(53 reference statements)
0
0
0
Order By: Relevance