2020
DOI: 10.1002/cpe.5923
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing execution for pipelined‐based distributed deep learning in a heterogeneously networked GPU cluster

Abstract: Exorbitant resources (computing and memory) are required to train a deep neural network (DNN). Often researchers deploy an approach that uses distributed parallel training to acquire larger models faster on GPUs. This approach has its detriments, though; on one hand, a GPU's expanded capacity to compute also produces bigger bottlenecks in inter-GPU's communications during model training, and multi-GPU systems lead to complex connectivity. Workload schedulers then end up having to consider hardware topology and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(1 citation statement)
references
References 22 publications
0
1
0
Order By: Relevance
“…In addition, two rpc.remote functions are used to call the two shards on two RPC workers so they can be accessed in the forward pass. [31,32].…”
Section: Proposed Hybrid Pipelinementioning
confidence: 99%
“…In addition, two rpc.remote functions are used to call the two shards on two RPC workers so they can be accessed in the forward pass. [31,32].…”
Section: Proposed Hybrid Pipelinementioning
confidence: 99%