2019
DOI: 10.1073/pnas.1903070116
|View full text |Cite
|
Sign up to set email alerts
|

Reconciling modern machine-learning practice and the classical bias–variance trade-off

Abstract: Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying structure in data, simple enough to avoid fitting … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

40
1,001
2
5

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 1,136 publications
(1,048 citation statements)
references
References 22 publications
(22 reference statements)
40
1,001
2
5
Order By: Relevance
“…We take n = 10 noisy measurements of this signal. Example 5 is directly inspired by feature models 17 in several recent papers [7,8,52], as well as having philosophical connections to other recent papers [6,46]. Figure 3 shows the performance of the minimum-2 -norm interpolator on Example 5 as we increase the number of features.…”
Section: The Minimum-2 -Norm Interpolator Through the Fourier Lensmentioning
confidence: 97%
See 3 more Smart Citations
“…We take n = 10 noisy measurements of this signal. Example 5 is directly inspired by feature models 17 in several recent papers [7,8,52], as well as having philosophical connections to other recent papers [6,46]. Figure 3 shows the performance of the minimum-2 -norm interpolator on Example 5 as we increase the number of features.…”
Section: The Minimum-2 -Norm Interpolator Through the Fourier Lensmentioning
confidence: 97%
“…Most recently, a double-descent curve on the test error (0 − 1 loss and MSE) as a function of the number of parameters of several parametric models was observed on several common datasets by physicists [37] and machine learning researchers [38] respectively. In these experiments, the minimum 2 -norm interpolating solution is used, and several feature families, including kernel approximators [39], were considered.…”
Section: High-dimensional Linear Regressionmentioning
confidence: 99%
See 2 more Smart Citations
“…symmetries, random shifts and crops, cutouts / random erasing, mixup), but note some things like mixup have not been used for regression, but in some cases worth just trying for your specific case. [49] showing that unlike traditional regression approaches, once NNs become sufficiently wide and deep they can generalize and do not overfit despite such a large number of parameters.…”
Section: Advanced Control Methodsmentioning
confidence: 99%