2019
DOI: 10.48550/arxiv.1909.11957
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Drawing Early-Bird Tickets: Towards More Efficient Training of Deep Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
43
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 24 publications
(43 citation statements)
references
References 14 publications
0
43
0
Order By: Relevance
“…The Lottery Ticket Hypothesis (LTH) states that typical dense neural networks contain a small sparse sub-network that can be trained to reach similar test accuracy in an equal number of steps (Frankle and Carbin, 2018). In view of that, follow-up works reveal that sparsity patterns might emerge at the initialization , the early stage of training (You et al, 2019) and (Chen et al, 2020b), or in dynamic forms throughout training (Evci et al, 2020) by updating model parameters and architecture typologies simultaneously. Some of the recent findings are that the lottery ticket hypothesis holds for BERT models, i.e., largest weights of the original network do form subnetworks that can be retrained alone to reach the performance close to that of the full model (Prasanna et al, 2020;Chen et al, 2020a).…”
Section: The Lottery Ticket Hypothesismentioning
confidence: 99%
See 2 more Smart Citations
“…The Lottery Ticket Hypothesis (LTH) states that typical dense neural networks contain a small sparse sub-network that can be trained to reach similar test accuracy in an equal number of steps (Frankle and Carbin, 2018). In view of that, follow-up works reveal that sparsity patterns might emerge at the initialization , the early stage of training (You et al, 2019) and (Chen et al, 2020b), or in dynamic forms throughout training (Evci et al, 2020) by updating model parameters and architecture typologies simultaneously. Some of the recent findings are that the lottery ticket hypothesis holds for BERT models, i.e., largest weights of the original network do form subnetworks that can be retrained alone to reach the performance close to that of the full model (Prasanna et al, 2020;Chen et al, 2020a).…”
Section: The Lottery Ticket Hypothesismentioning
confidence: 99%
“…Subnetworks that are found on the masked language modeling task transfer universally; those found on other tasks transfer in a limited fashion if at all. 1.6.2 EarlyBERT (Chen et al, 2020b) introduces EarlyBERT which extends the work done on finding lottery-tickets in CNNs (You et al, 2019) to speedup both pre-training and fine-tuning for BERT models. (You et al, 2019) realized that sparsity patterns might emerge at the initialization.…”
Section: The Lottery Ticket Hypothesismentioning
confidence: 99%
See 1 more Smart Citation
“…Pruning typically follows a three-step process of pre-training, pruning, and fine-tuning (Li et al, 2016;. Pre-training is usually the most expensive component, but later work explores strategies of finding good pruned networks with minimal pre-training (You et al, 2019;Chen et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…However, resource-constrained training was not explored much until a few recent efforts on classification [18,32,36].…”
Section: Introductionmentioning
confidence: 99%