2020
DOI: 10.48550/arxiv.2005.14038
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism

Abstract: Deep Neural Network (DNN) models have continuously been growing in size in order to improve the accuracy and quality of the models. Moreover, for training of large DNN models, the use of heterogeneous GPUs is inevitable due to the short release cycle of new GPU architectures. In this paper, we investigate how to enable training of large DNN models on a heterogeneous GPU cluster that possibly includes whimpy GPUs that, as a standalone, could not be used for training. We present a DNN training system, HetPipe (H… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 24 publications
0
4
0
Order By: Relevance
“…Parameter staleness-free Mesh-TensorFlow [14], Megatron-LM [10] Tensor Yes Manual No Yes OptCNN [15], FlexFlow [16], Tofu [17] Tensor Yes Auto No Yes GPipe [11] Graph No Manual No Yes AMPNet [18], XPipe [19] Graph No Manual No No PipeDream [8], SpecTrain [20] Graph Yes Auto No No PipeDream-2BW [21], HetPipe [22] Graph Yes Auto Yes No RaNNC (Ours) Graph Yes Auto Yes Yes In graph partitioning, such tasks are regarded as atomic and cannot be further partitioned. Unfortunately, when the partitioned subcomponents to be computed on different accelerator devices have sequential dependencies, only one accelerator device can be used at a time.…”
Section: Memory Estimationmentioning
confidence: 99%
See 3 more Smart Citations
“…Parameter staleness-free Mesh-TensorFlow [14], Megatron-LM [10] Tensor Yes Manual No Yes OptCNN [15], FlexFlow [16], Tofu [17] Tensor Yes Auto No Yes GPipe [11] Graph No Manual No Yes AMPNet [18], XPipe [19] Graph No Manual No No PipeDream [8], SpecTrain [20] Graph Yes Auto No No PipeDream-2BW [21], HetPipe [22] Graph Yes Auto Yes No RaNNC (Ours) Graph Yes Auto Yes Yes In graph partitioning, such tasks are regarded as atomic and cannot be further partitioned. Unfortunately, when the partitioned subcomponents to be computed on different accelerator devices have sequential dependencies, only one accelerator device can be used at a time.…”
Section: Memory Estimationmentioning
confidence: 99%
“…As mentioned in the previous section, some previous works [18], [19], [8], [20], [21], [22] employed asynchronous pipeline parallelism, which suffers from parameter staleness issues [9]. Such issues are caused by computing a mini-batch using different versions of parameters across stages.…”
Section: Memory Estimationmentioning
confidence: 99%
See 2 more Smart Citations