2021
DOI: 10.48550/arxiv.2110.11804
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

Abstract: We study an approach to learning pruning masks by optimizing the expected loss of stochastic pruning masks, i.e., masks which zero out each weight independently with some weight-specific probability. We analyze the training dynamics of the induced stochastic predictor in the setting of linear regression, and observe a data-adaptive L1 regularization term, in contrast to the dataadaptive L2 regularization term known to underlie dropout in linear regression. We also observe a preference to prune weights that are… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 24 publications
0
3
0
Order By: Relevance
“…Although it is limited to the case of a one-hidden-layer neural network, they provided insight into why the sparsity increases the generalization ability. For a PAC-Bayesian theory, Hayou et al [15] also used spike-and-slab prior and posterior distributions. However, their motivation and purpose are completely different from our work.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Although it is limited to the case of a one-hidden-layer neural network, they provided insight into why the sparsity increases the generalization ability. For a PAC-Bayesian theory, Hayou et al [15] also used spike-and-slab prior and posterior distributions. However, their motivation and purpose are completely different from our work.…”
Section: Related Workmentioning
confidence: 99%
“…The authors adopt a problem setting where the weights trained a few epochs ahead instead of the initial weights are used for ticket search following Frankle et al [11]; therefore their work does not have to consider suppressing the learning rate, i.e., the distance from the initial weights. Note that Hayou et al [15] proposed PAC-Bayes pruning (PBP) by optimizing the PAC-Bayes bound. However, the limitation of our analysis is that we cannot reveal an explicit relationship between continuous sparcification and PBP.…”
Section: Continuous Sparsificationmentioning
confidence: 99%
See 1 more Smart Citation