2021
DOI: 10.48550/arxiv.2104.05641
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Generalization bounds via distillation

Abstract: This paper theoretically investigates the following empirical phenomenon: given a highcomplexity network with poor generalization bounds, one can distill it into a network with nearly identical predictions but low complexity and vastly smaller generalization bounds. The main contribution is an analysis showing that the original network inherits this good generalization bound from its distillation, assuming the use of well-behaved data augmentation. This bound is presented both in an abstract and in a concrete … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 16 publications
0
1
0
Order By: Relevance
“…Different to the classical bound, we compress the hypothetical space F into a smaller space derived by the teacher feature extractor. Hsu et al [37] points out that the distillation helps to derive a finegrained analysis for the prediction risk and avoid getting a vacuous generalization bound. Therefore, we take the following theorem as a starting point to derive robustness transfer.…”
Section: Theoretical Analysismentioning
confidence: 99%
“…Different to the classical bound, we compress the hypothetical space F into a smaller space derived by the teacher feature extractor. Hsu et al [37] points out that the distillation helps to derive a finegrained analysis for the prediction risk and avoid getting a vacuous generalization bound. Therefore, we take the following theorem as a starting point to derive robustness transfer.…”
Section: Theoretical Analysismentioning
confidence: 99%