Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 2022
DOI: 10.1145/3503222.3507778
|View full text |Cite
|
Sign up to set email alerts
|

Breaking the computation and communication abstraction barrier in distributed machine learning workloads

Abstract: Recent trends towards large machine learning models require both training and inference tasks to be distributed. Considering the huge cost of training these models, it is imperative to unlock optimizations in computation and communication to obtain best performance. However, the current logical separation between computation and communication kernels in machine learning frameworks misses optimization opportunities across this barrier. Breaking this abstraction can provide many optimizations to improve the perf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(7 citation statements)
references
References 27 publications
(37 reference statements)
0
7
0
Order By: Relevance
“…Models and their deployment: Since Transformers are fast-evolving, we evaluate T3's impact on a range of Transformer models and TP degrees (Table 2). For Megatron-GPT-2 (Mega-GPT-2) [78] and T-NLG [47] we use 16K and 8K input tokens (= input-length * batch-size) and TP degrees of eight and 16, given their modern intra-node setups [25,36,47,78].…”
Section: Applications Deployment and Gemmsmentioning
confidence: 99%
See 3 more Smart Citations
“…Models and their deployment: Since Transformers are fast-evolving, we evaluate T3's impact on a range of Transformer models and TP degrees (Table 2). For Megatron-GPT-2 (Mega-GPT-2) [78] and T-NLG [47] we use 16K and 8K input tokens (= input-length * batch-size) and TP degrees of eight and 16, given their modern intra-node setups [25,36,47,78].…”
Section: Applications Deployment and Gemmsmentioning
confidence: 99%
“…Topologyindependent In-switch [36] X X X X ACE [71] X X X X CoCoNet [25] X X Google Decomposition [86] X X X T3-MCA Table 3. Comparing T3-MCA to prior work.…”
Section: No Additional Acceleratormentioning
confidence: 99%
See 2 more Smart Citations
“…GShard [8] creates shards of weights and model states that can be split among ranks. CoCoNet [13] introduces a domain-specific language to easily express communication and computation in distributed training. Megatron-LM [10] introduces an efficient intra-layer model-parallel approach to support training of very large transformer models.…”
Section: Distributed Neural Network Trainingmentioning
confidence: 99%