2022
DOI: 10.48550/arxiv.2202.12002
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Rare Gems: Finding Lottery Tickets at Initialization

Abstract: It has been widely observed that large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by typically following a time-consuming "train, prune, re-train" approach. Frankle & Carbin (2018) conjecture that we can avoid this by training lottery tickets, i.e., special sparse subnetworks found at initialization, that can be trained to high accuracy. However, a subsequent line of work presents concrete evidence that current algorithms for finding trainable networ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 18 publications
0
5
0
Order By: Relevance
“…Later, Pensia et al (2020) improved the widening factor to being logarithmic and Sreenivasan et al (2021) proved that with a polylogarithmic widening factor, such a result holds even if the network weight is binary. A follow-up work shows that it is possible to find a subnetwork achieving good performance at the initialization and then fine-tune (Sreenivasan et al, 2022). Our work, on the other hand, analyzes the gradient descent dynamics of a pruned neural network and its generalization after training.…”
Section: Related Workmentioning
confidence: 99%
“…Later, Pensia et al (2020) improved the widening factor to being logarithmic and Sreenivasan et al (2021) proved that with a polylogarithmic widening factor, such a result holds even if the network weight is binary. A follow-up work shows that it is possible to find a subnetwork achieving good performance at the initialization and then fine-tune (Sreenivasan et al, 2022). Our work, on the other hand, analyzes the gradient descent dynamics of a pruned neural network and its generalization after training.…”
Section: Related Workmentioning
confidence: 99%
“…This paper has inspired a quickly growing body of work that has found that a network contains not one but several tickets [18], which may be connected [19]. Moreover, after it was shown that tickets can be identified early in the training process [20], some methods have succeeded in pruning before training [21]- [30], even before looking at the data, in order to reduce training cost.…”
Section: ) the Weak Lottery Ticket Hypothesismentioning
confidence: 99%
“…Here, we discuss possible ways of improving the performance of our method. Note that a recent work (Sreenivasan et al 2022) on pruning discriminative networks found that there are two methods to improve the performance of EP: (1) using global EP (pruning weights by sorting the scores globally) instead of vanilla EP (pruning weights by sorting the scores at each layer), and (2) using gradual pruning (moving from dense regime to sparse regime gradually during pruning) instead of vanilla EP which moves to the sparse regime from the beginning. Inspired by this observation, we expect applying EP with these two variants (global pruning and gradual pruning) in our method has the potential to improve the performance of the SLT in generative models.…”
Section: Factor Analysismentioning
confidence: 99%
“…In this work, we focus on the model pruning techniques that fall into the latter category. Unlike discriminative models where various pruning techniques (LeCun, Denker, and Solla 1989;Hassibi and Stork 1992;Han et al 2015;Frankle and Carbin 2018;Ramanujan et al 2020;Sreenivasan et al 2022) have been actively studied, pruning generative models have not been extensively explored. Moreover, it has been found that naïve application of existing pruning methods (developed for discriminative models) to generative models leads to performance degradation and/or unstable training (Wang et al 2020;Li et al 2021).…”
Section: Introductionmentioning
confidence: 99%