2014
DOI: 10.48550/arxiv.1412.6614
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

Abstract: We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multi-layer feedforward networks. We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

8
178
2

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 128 publications
(188 citation statements)
references
References 6 publications
8
178
2
Order By: Relevance
“…An intriguing empirical phenomenon called "double descent" has recently emerged in the study of overparameterized learning models (Neyshabur et al, 2014;Nakkiran et al, 2019;Belkin et al, 2018Belkin et al, , 2019Belkin, 2021). Consider, for example, a risk curve that depicts how the generalization error varies as more parameters are added to the model.…”
Section: Motivation: a Multi-descent Phenomenonmentioning
confidence: 99%
“…An intriguing empirical phenomenon called "double descent" has recently emerged in the study of overparameterized learning models (Neyshabur et al, 2014;Nakkiran et al, 2019;Belkin et al, 2018Belkin et al, , 2019Belkin, 2021). Consider, for example, a risk curve that depicts how the generalization error varies as more parameters are added to the model.…”
Section: Motivation: a Multi-descent Phenomenonmentioning
confidence: 99%
“…Ma et al (2021) show that even without explicit regularization, when the learning rate of GD is infinitesimal, i.e., GD approximating gradient flow, X and Y maintain the gap in their magnitudes. Such an effect is more broadly recognized as implicit bias of learning algorithms (Neyshabur et al, 2014;Gunasekar et al, 2018;Soudry et al, 2018). Built upon this implicit bias, Du et al (2018) further prove that GD with diminishing learning rates converges to a bounded global minimum of (1), and this conclusion is recently extended to the case of a constant small learning rate (Ye & Du, 2021).…”
Section: Background and Related Workmentioning
confidence: 99%
“…By construction, the empirical risk is always non-negative, and hence so is the training error: min θ∈Θ R n (θ) ≥ 0. However, state-of-the-art models in machine learning are often overparametrized, in the sense that they can achieve vanishing, or nearly vanishing training error, even with noisy labels: min θ∈Θ R n (θ) = 0 [NTS14]. The emphasis on noise is crucial here: classically one would expect vanishing empirical risk only in the absence of noise, i.e., if we had E{ (y; f * (x))} = 0 for some function f * that happens to be in the model class {f ( •; θ)} θ∈Θ .…”
Section: Introductionmentioning
confidence: 99%