2022
DOI: 10.48550/arxiv.2209.14981
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging

Abstract: Training vision or language models on large datasets can take days, if not weeks. We show that averaging the weights of the k latest checkpoints, each collected at the end of an epoch, can speed up the training progression in terms of loss and accuracy by dozens of epochs, corresponding to time savings up to 68 and 30 GPU hours when training a ResNet50 on ImageNet and RoBERTa-Base model on WikiText-103, respectively. We also provide the code and model checkpoint trajectory to reproduce the results and facilita… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 23 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?