2018
DOI: 10.48550/arxiv.1812.08119
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Adam Induces Implicit Weight Sparsity in Rectifier Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
5
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(6 citation statements)
references
References 9 publications
1
5
0
Order By: Relevance
“…As an alternative to SGD we also evaluated our regularization strategy against Adam optimization (Table 2). In alignment with the reported tendency of this adaptive method [11], our visualization clearly indicates high sparsity (Figure A.1). We further demonstrate that sparsity regularization can cause strongly correlating features if applied to method-induced sparsity (Figure A.2).…”
Section: Targeted Sparsity Regularizationsupporting
confidence: 86%
See 2 more Smart Citations
“…As an alternative to SGD we also evaluated our regularization strategy against Adam optimization (Table 2). In alignment with the reported tendency of this adaptive method [11], our visualization clearly indicates high sparsity (Figure A.1). We further demonstrate that sparsity regularization can cause strongly correlating features if applied to method-induced sparsity (Figure A.2).…”
Section: Targeted Sparsity Regularizationsupporting
confidence: 86%
“…Given inappropriate combinations of network size and training data special measures have to be considered to prevent overfitting [8]. Popular countermeasures include (i) artificially increasing the amount of training data [9,10]; (ii) reducing the network's capacity; and (iii) changing the learning strategy [11]. The reduction of the capacity can either be done explicitly by reducing the amount of learnable parameters (i.e.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…With the finding of increased sparsity due to BN + ReLU, a natural question is whether the sparsity is favorable for the network. Several recent works treated the sparsity as a windfall in network training since it automatically generates a pruned network [20,21]. However, in this work, we show that the accuracy of the sparsified network only matches that using uniform channel pruning, suggesting detrimental effects of the collapsed filters.…”
Section: Introductionmentioning
confidence: 58%
“…Several recent studies have investigated the dying probability with network depth [25]. The ReLU-related network sparsity in neural networks has also been noticed by two contemporary works [21,20], yet with no special attention to batch normalization. The sparsification change with different hyper-parameters such as LR, weight decay and training algorithms was experimentally analyzed in [21].…”
Section: Dying Relumentioning
confidence: 98%