2018
DOI: 10.48550/arxiv.1804.05862
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach

Abstract: Modern neural networks are highly overparameterized, with capacity to substantially overfit to training data. Nevertheless, these networks often generalize well in practice. It has also been observed that trained networks can often be "compressed" to much smaller representations. The purpose of this paper is to connect these two empirical observations. Our main technical result is a generalization bound for compressed networks based on the compressed size that, combined with off-theshelf compression algorithms… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 21 publications
(17 citation statements)
references
References 15 publications
0
17
0
Order By: Relevance
“…One response to this criticism is a computational framework from Dziugaite and Roy (2017). That work shows that directly optimizing the PAC-Bayes bound leads to a much smaller bound and low test error simultaneously (see also Zhou et al (2018) for a large-scale study). The recent work of Jiang et al (2020) further compared different complexity notions and noted that the ones given by PAC-Bayes tools correlate better with empirical performance.…”
Section: Discussionmentioning
confidence: 81%
“…One response to this criticism is a computational framework from Dziugaite and Roy (2017). That work shows that directly optimizing the PAC-Bayes bound leads to a much smaller bound and low test error simultaneously (see also Zhou et al (2018) for a large-scale study). The recent work of Jiang et al (2020) further compared different complexity notions and noted that the ones given by PAC-Bayes tools correlate better with empirical performance.…”
Section: Discussionmentioning
confidence: 81%
“…Other Approaches to Generalization. There are some approaches beyond stability and uniform convergence, including PAC-Bayes [15,16,40,41,50,55], information-based bound [5,23,44,46,49], and compression-based bound [2][3][4].…”
Section: Related Workmentioning
confidence: 99%
“…[17,18,19], and have been employed more specifically for neural networks in e.g. [20,21,22,23], but again these bounds are not multiscale. The paper [24] combines PAC-Bayes bounds with generic chaining and obtains multiscale bounds that rely on auxiliary sample sets, however, an important difference between our generalization bound and [24] is that our bound puts forward the multiscale entropic regularization of the empirical risk, for which we can characterize the minimizer exactly.…”
Section: Further Relations With Prior Workmentioning
confidence: 99%