2020
DOI: 10.48550/arxiv.2007.08243
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Lottery Tickets in Linear Models: An Analysis of Iterative Magnitude Pruning

Abstract: We analyse the pruning procedure behind the lottery ticket hypothesis , iterative magnitude pruning (IMP), when applied to linear models trained by gradient flow. We begin by presenting sufficient conditions on the statistical structure of the features, under which IMP prunes those features that have smallest projection onto the data. Following this, we explore IMP as a method for sparse estimation and sparse prediction in noisy settings. The same techniques are then applied to derive corresponding results for… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 7 publications
0
8
0
Order By: Relevance
“…While dynamical systems theory is a natural language with which to frame DNN optimization, the complex dependence on optimizer, architecture, activation function, and training data has historically kept efforts in this direction to a minimum. This has led to a reliance on heuristic methods, such as iterative magnitude pruning, whose basis of success is still not clear [8]. Other groups have attempted to more principally examine DNN behavior by studying mathematical objects, such as the spectrum of the Hessian matrix [12] and the spectrum of the principal orthogonal decomposition [24].…”
Section: Discussionmentioning
confidence: 99%
“…While dynamical systems theory is a natural language with which to frame DNN optimization, the complex dependence on optimizer, architecture, activation function, and training data has historically kept efforts in this direction to a minimum. This has led to a reliance on heuristic methods, such as iterative magnitude pruning, whose basis of success is still not clear [8]. Other groups have attempted to more principally examine DNN behavior by studying mathematical objects, such as the spectrum of the Hessian matrix [12] and the spectrum of the principal orthogonal decomposition [24].…”
Section: Discussionmentioning
confidence: 99%
“…where W is the mean of W with entries W i = λ 0 i w P i and A(P ) is the feature alignment vector for the data S P , with entries A i (P ) = 1 |X P | ψ i (X P ) T Y P , i.e., A i (P ) measures how closely feature i is aligned to the outputs Y P [20]. Then E[W T Σ P W ] = Tr(ΓΣ P ), where…”
Section: Linear Model Analysismentioning
confidence: 99%
“…Recent work by Elesedy, Kanade, and Teh [20] has shown that, in the context of linear models, magnitude pruning zeros out the weights based on the magnitude of feature alignment under certain assumptions on the feature covariance matrix. Our analysis is complementary to these findings.…”
Section: Related Workmentioning
confidence: 99%
“…While exciting, to date there exists no principled understanding of why winning tickets can be transferred between tasks, nor does there exist a way to know, a priori, which tasks a given winning ticket can be transferred to. Additionally, there is a lack of theoretical work on iterative magnitude pruning (IMP) [11], the most common method used to find winning tickets. This is in striking analogy to the state of statistical physics in the early-to-mid-20 th century.…”
Section: Introductionmentioning
confidence: 99%