2019
DOI: 10.48550/arxiv.1906.10732
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Difficulty of Training Sparse Neural Networks

Abstract: We investigate the difficulties of training sparse neural networks and make new observations about optimization dynamics and the energy landscape within the sparse regime. Recent work of Gale et al. (2019);Liu et al. (2018) has shown that sparse ResNet-50 architectures trained on ImageNet-2012 dataset converge to solutions that are significantly worse than those found by pruning. We show that, despite the failure of optimizers, there is a linear path with a monotonically decreasing objective from the initializ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
28
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(29 citation statements)
references
References 5 publications
1
28
0
Order By: Relevance
“…Starting from scratch, those methods learn to optimize the model weights together with sparse connectivity simultaneously. [23,24] first introduced the Sparse Evolutionary Training (SET) technique [23], reaching superior performance compared to training with fixed sparse connectivity [72,27]. [28][29][30] leverages "weight reallocation" to improve performance of obtained sparse subnetworks.…”
Section: Related Workmentioning
confidence: 99%
“…Starting from scratch, those methods learn to optimize the model weights together with sparse connectivity simultaneously. [23,24] first introduced the Sparse Evolutionary Training (SET) technique [23], reaching superior performance compared to training with fixed sparse connectivity [72,27]. [28][29][30] leverages "weight reallocation" to improve performance of obtained sparse subnetworks.…”
Section: Related Workmentioning
confidence: 99%
“…However, this approach comes with severe drawbacks: 1) It does not allow to design training phase and inference phase independently from each other. This restricts interoperability, hinders efficient training, and compromises performance [7]. 2) Sparsity alone does not necessarily reduce computational cost, as it may require higher accuracy, i.e.…”
Section: Motivationmentioning
confidence: 99%
“…There are also works study the vanishing gradient problem (VGP) that appears frequently in deep neural network training. Alford et al [2019], Evci et al [2019] point out that the zero values in full-precision weights lead to VGP when training highly pruned neural networks, which eventually jeopardizes the accuracy of the model. Dong et al [2017], Zhou et al [2017] suggest to maintain a portion of the full-precision weights during quantized network training for improving the performance of the model.…”
Section: Related Workmentioning
confidence: 99%