2022
DOI: 10.1609/aaai.v36i6.20611
|View full text |Cite
|
Sign up to set email alerts
|

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Abstract: Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exceptions of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). In this work, we attempt to answer: (1) why training unstructured sparse networks from random initialization performs poorly… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
76
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 36 publications
(78 citation statements)
references
References 30 publications
2
76
0
Order By: Relevance
“…Our work identifies the mechanism by which IMP finds matching solutions: the algorithm maintains the information from the dense network about the loss landscape by encoding this information into the mask. Evci et al (2022) report similar findings but for a different setting. In their work, they construct sparse masks from a pruned solution (sparse network trained to convergence), where the latter is obtained through gradual magnitude pruning (GMP) throughout training (Zhu & Gupta, 2017) as opposed to IMP.…”
Section: A Related Worksupporting
confidence: 59%
See 2 more Smart Citations
“…Our work identifies the mechanism by which IMP finds matching solutions: the algorithm maintains the information from the dense network about the loss landscape by encoding this information into the mask. Evci et al (2022) report similar findings but for a different setting. In their work, they construct sparse masks from a pruned solution (sparse network trained to convergence), where the latter is obtained through gradual magnitude pruning (GMP) throughout training (Zhu & Gupta, 2017) as opposed to IMP.…”
Section: A Related Worksupporting
confidence: 59%
“…The two closest related works to our results are and Evci et al (2022). Both works consider linear mode connectivity between two networks of the same sparsity.…”
Section: Retraining Finds Matching Subnetwork If Sgd Is Robust To Per...mentioning
confidence: 67%
See 1 more Smart Citation
“…3) From-scratch with learned one-shot pruning pattern (Figure 1h) [13,11], which determines the sparsity pattern from the trained dense version and trains a sparse model from scratch. 4) From-scratch while learning sparsity pattern (Figure 1i) [51,6,14,26,9,4,34,58,10], which trains a sparse model from scratch while learning sparsity patterns simultaneously.…”
Section: Related Workmentioning
confidence: 99%
“…The irregularity of the sparsity pattern makes it challenging to be effectively leveraged by the dense accelerators such as GPU and TPU. The sparsified models often ends up with similar or worse performance (because of the extra complexity to compress and decompress the parameters) than their dense counterparts [2,32,43,21,30,15,59,50,10].…”
Section: Introductionmentioning
confidence: 99%