2021
DOI: 10.48550/arxiv.2105.04663
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

GSPMD: General and Scalable Parallelization for ML Computation Graphs

Abstract: We present GSPMD, an automatic, compiler-based parallelization system for common machine learning computation graphs. It allows users to write programs in the same way as for a single device, then give hints through a few annotations on how to distribute tensors, based on which GSPMD will parallelize the computation. Its representation of partitioning is simple yet general, allowing it to express different or mixed paradigms of parallelism on a wide variety of models.GSPMD infers the partitioning for every ope… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
34
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 19 publications
(34 citation statements)
references
References 17 publications
0
34
0
Order By: Relevance
“…Communication is thus required to fetch the input data from other devices. When the tensors are partitioned evenly, i.e., SPMD [52], all devices follow the same collective communication patterns such as all-reduce, all-gather, and all-to-all. Pipeline parallelism.…”
Section: Conventional View Of ML Parallelismmentioning
confidence: 99%
See 4 more Smart Citations
“…Communication is thus required to fetch the input data from other devices. When the tensors are partitioned evenly, i.e., SPMD [52], all devices follow the same collective communication patterns such as all-reduce, all-gather, and all-to-all. Pipeline parallelism.…”
Section: Conventional View Of ML Parallelismmentioning
confidence: 99%
“…Manual combination of parallelisms. Recent development shows the approaches mentioned above need to be combined to scale out today's large DL models [36,52]. The state-of-theart training systems, such as Megatron-LM [36,45], manually design a specialized execution plan that combines these parallelisms for transformer language models.…”
Section: Conventional View Of ML Parallelismmentioning
confidence: 99%
See 3 more Smart Citations