2020
DOI: 10.48550/arxiv.2010.03533
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Abstract: Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exceptions of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). In this work, we attempt to answer: (1) why training unstructured sparse networks from random initialization performs poorly… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

1
20
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(21 citation statements)
references
References 13 publications
1
20
0
Order By: Relevance
“…We show that by using a training method that is stable with respect to linear mode connectivity, large networks can also be entirely rewound to initialization. Our subsequent experiments on common vision tasks give strong credence to the hypothesis in Evci et al (2020b) that lottery tickets simply retrain to the same regions (although not necessarily to the same basin). These results imply that existing lottery tickets could not have been found without the preceding dense training by iterative magnitude pruning, raising doubts about the use of the lottery ticket hypothesis.…”
supporting
confidence: 56%
See 2 more Smart Citations
“…We show that by using a training method that is stable with respect to linear mode connectivity, large networks can also be entirely rewound to initialization. Our subsequent experiments on common vision tasks give strong credence to the hypothesis in Evci et al (2020b) that lottery tickets simply retrain to the same regions (although not necessarily to the same basin). These results imply that existing lottery tickets could not have been found without the preceding dense training by iterative magnitude pruning, raising doubts about the use of the lottery ticket hypothesis.…”
supporting
confidence: 56%
“…IMP Interpretation. Evci et al (2020b) recently proposed a possible interpretation for the behavior of IMP and the success of the LTH. The authors posit that lottery tickets cannot be considered random initializations but that a lottery ticket contains a prior for rediscovering the solution of the model from which it was pruned.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…They also allow reallocation of the weights across the layers. In [11,7,25,12], the gradient information is used to determine which connections would be changed during the evolution phase.…”
Section: Related Workmentioning
confidence: 99%
“…Pruning. Pruning strategies for neural networks can be roughly separated into structured (Han et al, 2016;Li et al, 2016;Liu et al, 2017; and unstructured (Evci et al, 2019(Evci et al, , 2020Frankle and Carbin, 2018;Han et al, 2015) variants. Structured pruning, as considered in this work, prunes parameter groups instead of individual weights, allowing speedups to be achieved without sparse computation (Li et al, 2016).…”
Section: Related Workmentioning
confidence: 99%