2022
DOI: 10.1214/21-aos2133
|View full text |Cite
|
Sign up to set email alerts
|

Surprises in high-dimensional ridgeless least squares interpolation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

6
118
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 202 publications
(124 citation statements)
references
References 37 publications
6
118
0
Order By: Relevance
“…The work of Fan and Lv 67 focused on variable selection and the asymptotic behavior of their novel iterative sure independence screening under varying high‐dimensional or ultrahigh‐dimensional settings. During the preparation of the current version (the first draft was uploaded to arXiv in 2017 69 ), it has been found that the optimal ridge penalty is positive when the coefficients are generated from a distribution 32,33 whereas the optimal ridge penalty could be negative when the coefficients are fixed 34 . These studies rely on random matrix theorems and assumptions of the covariance structure of the explanatory variable.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The work of Fan and Lv 67 focused on variable selection and the asymptotic behavior of their novel iterative sure independence screening under varying high‐dimensional or ultrahigh‐dimensional settings. During the preparation of the current version (the first draft was uploaded to arXiv in 2017 69 ), it has been found that the optimal ridge penalty is positive when the coefficients are generated from a distribution 32,33 whereas the optimal ridge penalty could be negative when the coefficients are fixed 34 . These studies rely on random matrix theorems and assumptions of the covariance structure of the explanatory variable.…”
Section: Discussionmentioning
confidence: 99%
“…For prediction in deep learning and estimation, the results are mixed. For example, the optimal ridge penalty is found to be positive when the coefficients are generated from a distribution 32,33 whereas the optimal ridge penalty might be negative when the coefficients are fixed 34 . However, in practice, the true model is never known.…”
Section: Introductionmentioning
confidence: 99%
“…In particular, impressive results in image classification, pattern recognition and feature extraction were achieved by means of deep convolutional neural networks. Borrowing from Wigner, the efforts of many researchers are currently directed to provide explanations for the "unreasonable effectiveness" of these models and related intriguing phenomena, such as the double descent error curve [3,12,21,23] or the instability to adversarial attacks [1,8,10,14,28].…”
Section: Introductionmentioning
confidence: 99%
“…The most representative approach involves the utilization of sparsity via ℓ 1 -norm regularization and its variants (Candes and Tao, 2007;Van de Geer, 2008;Bühlmann and Van De Geer, 2011;Hastie et al, 2019), which is effective when a signal to be estimated has many zero elements. Another study investigated the high-dimensional limit of risks of estimators (Dobriban and Wager, 2018;Belkin et al, 2019;Hastie et al, 2022;Bartlett et al, 2020) and revealed the risk limit when 𝑛 data instances and 𝑝 parameters diverged infinitely, while their ratio 𝑝/𝑛 converged to a positive constant value. Interpolators are a typical estimator that perfectly fit the observed data under the 𝑝 𝑛 setting; moreover, it has been demonstrated that the risk or Date: April 19, 2022.…”
mentioning
confidence: 99%
“…variance of interpolators converges to zero (Liang and Rakhlin, 2020;Hastie et al, 2022;Ba et al, 2019). It should be noted that these studies did not use the sparsity of data or signals; however, they are compatible with recent large-scale data analyses.…”
mentioning
confidence: 99%