2018
DOI: 10.48550/arxiv.1806.03377
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PipeDream: Fast and Efficient Pipeline Parallel DNN Training

Abstract: PipeDream is a Deep Neural Network (DNN) training system for GPUs that parallelizes computation by pipelining execution across multiple machines. Its pipeline parallel computing model avoids the slowdowns faced by data-parallel training when large models and/or limited network bandwidth induce high communication-tocomputation ratios. PipeDream reduces communication by up to 95% for large DNNs relative to data-parallel training, and allows perfect overlap of communication and computation. PipeDream keeps all av… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
71
0
2

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 46 publications
(75 citation statements)
references
References 23 publications
2
71
0
2
Order By: Relevance
“…On the other hand, researches about model parallelism attempt to study how to allocate model parameters and training computation across compute units in a cluster to maximize training throughput and minimize communication overheads. Optimizations are proposed for both operation partitioning approach [37,74,75,92] and pipeline parallel approach [32,34,84]. Recently, there are also approaches that combine both data and model parallelism [51,67,70].…”
Section: Distributed Deep Learningmentioning
confidence: 99%
“…On the other hand, researches about model parallelism attempt to study how to allocate model parameters and training computation across compute units in a cluster to maximize training throughput and minimize communication overheads. Optimizations are proposed for both operation partitioning approach [37,74,75,92] and pipeline parallel approach [32,34,84]. Recently, there are also approaches that combine both data and model parallelism [51,67,70].…”
Section: Distributed Deep Learningmentioning
confidence: 99%
“…Pipeline Parallelism: To accelerate the distributed training process, PipeDream (Harlap et al, 2018;Narayanan et al, 2019Narayanan et al, , 2021 and GPipe (Huang et al, 2019) propose the pipelined model parallelism so that multiple input data can be pushed through all the available workers in a sequential order. To be specific, PipeDream pipelines the execution of forward passes and intersperses them with BPs in an attempt to minimize the processor idle time.…”
Section: Related Workmentioning
confidence: 99%
“…To improve the training efficiency, various parallelization techniques such as the dataparallelism (Iandola et al, 2016), model-parallelism (Dean et al, 2012), and a combination of both (Paine et al, 2013;Harlap et al, 2018) have been proposed to reduce the training runtime. Unfortunately, none of these methods could fully overcome the scalability barrier created by the intrinsically serial propagation of data within the network itself (Günther et al, 2020), thereby forcing the distributed machines to work synchronously, and hence preventing us from fully leveraging the computing resources.…”
Section: Introductionmentioning
confidence: 99%
“…To maximise performance on the IPU, it becomes important to keep as much of the working memory -for example, activation state -on-chip. This naturally promotes the use of much smaller batches, memory saving optimisation (Chen et al, 2016;Gruslys et al, 2016), and innovative forms of distributed processing (Harlap et al, 2018;Huang et al, 2019;Ben-Nun & Hoefler, 2018;Shazeer et al, 2018). At the same time, it does require reconsidering the use of Batch Normalization (Ioffe & Szegedy, 2015), the most common normalization method in vision models, which relies on large batches.…”
Section: Hardware Considerationsmentioning
confidence: 99%