2020
DOI: 10.48550/arxiv.2006.05800
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression

Abstract: We consider the linear model y = Xβ + with X ∈ R n×p in the overparameterized regime p > n. We estimate β via generalized (weighted) ridge regression: βλ = X X + λΣw † X y, whereΣw is the weighting matrix. Assuming a random effects model with general data covariance Σx and anisotropic prior on the true coefficients β , i.e., Eβ β = Σ β , we provide an exact characterization of the prediction risk E(y − x βλ ) 2 in the proportional asymptotic limit p/n → γ ∈ (1, ∞). Our general setup leads to a number of intere… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 36 publications
0
7
0
Order By: Relevance
“…Termed "benign overfitting" by [2], this phenomenon has been studied analytically in the framework of linear regression with isotropic noise. It was extended to the anisotropic case later in [16], where the authors proposed the fruitful idea of "alignment" and "misalignment" between the signal and the covariance matrix of the noise.…”
Section: Related Workmentioning
confidence: 99%
“…Termed "benign overfitting" by [2], this phenomenon has been studied analytically in the framework of linear regression with isotropic noise. It was extended to the anisotropic case later in [16], where the authors proposed the fruitful idea of "alignment" and "misalignment" between the signal and the covariance matrix of the noise.…”
Section: Related Workmentioning
confidence: 99%
“…7 While we were finishing this work, we became aware of parallel work by Wu and Xu [2020] who derive asymptotic risk predictions for ridge regression with general quadratic penalties that subsume the diagonal, group-wise constant penalty matrices that we consider. When specialized to our setting, their asymptotic risk formulae are less natural than ours as they are not phrased in terms of λ 1 , .…”
Section: Using a Single Regularization Parametermentioning
confidence: 99%
“…Existing research has been focused on studying the learning risk behavior of the interpolator (the estimator that interpolates the training data) under two regimes. The first regime investigates the asymptotic learning risk of the interpolator (Mei and Montanari, 2019;Hastie et al, 2019;Liao et al, 2020;Wu and Xu, 2020;Richards et al, 2021). These works derive the asymptotic learning risk by assuming that the data dimension d and number of training samples n (and number of parameters s in the non-linear regression case) grow simultaneously while their ratio is kept fixed.…”
Section: Introductionmentioning
confidence: 99%