2019 IEEE Intl Conf on Parallel &Amp; Distributed Processing With Applications, Big Data &Amp; Cloud Computing, Sustainable Com 2019
DOI: 10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00074
|View full text |Cite
|
Sign up to set email alerts
|

A Variable Batch Size Strategy for Large Scale Distributed DNN Training

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
4
0
2

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 13 publications
1
4
0
2
Order By: Relevance
“…As training is scaled up to 128 GPUs across 64 nodes, all models shift to lower memory utilization; however, we see that the GNNs also see lower SM utilization alongside memory utilization. The decrease in memory utilization observed across models with an increasing number of GPUs is in line with our expectation that batch size scaling is still important for efficient distributed training [16]. Something to note is that while BERT sees about a two-fold reduction in memory utilization (from about 40% to 20%) as we scale up training from 2 to 128 GPUs, SM utilization barely changes (and slightly increases).…”
Section: Experimental Results and Findingssupporting
confidence: 82%
“…As training is scaled up to 128 GPUs across 64 nodes, all models shift to lower memory utilization; however, we see that the GNNs also see lower SM utilization alongside memory utilization. The decrease in memory utilization observed across models with an increasing number of GPUs is in line with our expectation that batch size scaling is still important for efficient distributed training [16]. Something to note is that while BERT sees about a two-fold reduction in memory utilization (from about 40% to 20%) as we scale up training from 2 to 128 GPUs, SM utilization barely changes (and slightly increases).…”
Section: Experimental Results and Findingssupporting
confidence: 82%
“…Three convolutional layers and a fully connected layer are selected. The x‐axis represents the number of training epochs 10 …”
Section: Variable Batch Size Strategymentioning
confidence: 99%
“…Top‐1 training error and test (validation) error of ResNet‐50 network trained using ImageNet‐1K dataset with different batch size 10 …”
Section: Introductionmentioning
confidence: 99%
“…Вступ. За останнє десятиліття, набір алгоритмів машинного навчання під назвою глибоке навчання (deep learning) призвів до значних поліпшень у задачах комп'ютерного зору [10], розпізнавання та обробки природних мов [2,3]. Це призвело до широкого застосування різноманітних комерційних продуктів, що засновані на навчанні, в різних сферах людської діяльності.…”
unclassified
“…У кількох останніх роботах обговорюється використання великої швидкості навчання [7], малого [8] та великого [9,10] розміру підвиборок наборів навчальних даних. Вони демонструють, що співвідношення швидкості навчання до розміру підвиборки набору даних впливає на навчання.…”
unclassified