2021
DOI: 10.48550/arxiv.2106.06955
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets Win

Jaron Maene,
Mingxiao Li,
Marie-Francine Moens

Abstract: The lottery ticket hypothesis states that sparse subnetworks exist in randomly initialized dense networks that can be trained to the same accuracy as the dense network they reside in. However, the subsequent work has failed to replicate this on large-scale models and required rewinding to an early stable state instead of initialization. We show that by using a training method that is stable with respect to linear mode connectivity, large networks can also be entirely rewound to initialization. Our subsequent e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 9 publications
0
3
0
Order By: Relevance
“…1) Iterative Magnitude Pruning: Iterative Magnitude Pruning is a common approach to start with training a dense network and subsequently removing weights based on a specific criterion, such as magnitude (absolute value) [16]. For optimal results, this process is typically repeated iteratively by alternating between weight pruning and network retraining.…”
Section: B Pruning Strategiesmentioning
confidence: 99%
See 1 more Smart Citation
“…1) Iterative Magnitude Pruning: Iterative Magnitude Pruning is a common approach to start with training a dense network and subsequently removing weights based on a specific criterion, such as magnitude (absolute value) [16]. For optimal results, this process is typically repeated iteratively by alternating between weight pruning and network retraining.…”
Section: B Pruning Strategiesmentioning
confidence: 99%
“…IMP is an effective method for reducing the size of neural networks and improving their efficiency without significant loss of accuracy. IMP has been used to prune LLMs as well [16], but there are a few limitations such as re-training overhead, dense connections and structural redundancy in the transformer architecture.…”
Section: B Pruning Strategiesmentioning
confidence: 99%
“…Parameters that are not in this low-dimensional subspace can, therefore, be removed with minimal impact. If a sparse DNN is initialized in this subspace (as late rewinding aims to do), then it may be possible for training to find the same, or related, local minima as the full DNN [13,27].…”
Section: Connection Between the Rg And Standard Lth Frameworkmentioning
confidence: 99%