2014
DOI: 10.1093/biomet/asu034
|View full text |Cite
|
Sign up to set email alerts
|

When does more regularization imply fewer degrees of freedom? Sufficient conditions and counterexamples

Abstract: Regularization aims to improve prediction performance of a given statistical modeling approach by moving to a second approach which achieves worse training error but is expected to have fewer degrees of freedom, i.e., better agreement between training and prediction error. We show here, however, that this expected behavior does not hold in general. In fact, counter examples are given that show regularization can increase the degrees of freedom in simple situations, including lasso and ridge regression, which a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(13 citation statements)
references
References 27 publications
0
13
0
Order By: Relevance
“…Second, even in situations in which the Lagrange and constrained forms of a particular optimization problem are equivalent (e.g., this is true under strong duality, and so it is true for most convex problems, under very weak conditions), there is a difference between studying the degrees of freedom of an estimator defined in one problem form versus the other. This is because the map from the Lagrange parameter in one form to the constraint bound in the other generically depends on y, i.e., it is a random mapping (Kaufman & Rosset (2013) discuss this for ridge regression and the lasso). Lastly, in this paper, we focus on the Lagrange form (4) of subset selection because we find this problem is easier to analyze mathematically.…”
Section: Lagrange Versus Constrained Problem Formsmentioning
confidence: 99%
See 1 more Smart Citation
“…Second, even in situations in which the Lagrange and constrained forms of a particular optimization problem are equivalent (e.g., this is true under strong duality, and so it is true for most convex problems, under very weak conditions), there is a difference between studying the degrees of freedom of an estimator defined in one problem form versus the other. This is because the map from the Lagrange parameter in one form to the constraint bound in the other generically depends on y, i.e., it is a random mapping (Kaufman & Rosset (2013) discuss this for ridge regression and the lasso). Lastly, in this paper, we focus on the Lagrange form (4) of subset selection because we find this problem is easier to analyze mathematically.…”
Section: Lagrange Versus Constrained Problem Formsmentioning
confidence: 99%
“…It is worth mentioning the interesting, recent works of Kaufman & Rosset (2013) and Janson et al (2013), which investigate unexpected nonmonoticities in the (total) degrees of freedom of an estimator, as a function of some underlying parametrization for the amount of imposed regularization. We note that the right panel of Figure 3 portrays a definitive example of this, in that the best subset selection degrees of freedom undergoes a major nonmonoticity at 10 (expected) active variables, as discussed above.…”
Section: Example: Sparse Signalmentioning
confidence: 99%
“…The common intuition that "effective" or "equivalent" degrees of freedom serves a consistent and interpretable measure of model complexity merits some degree of skepticism. Our results and examples, combined with those of Kaufman and Rosset (2014), demonstrate that for many widely-used convex and non-convex fitting techniques, the DF can be non-monotone with respect to model nesting. In the nonconvex case, the DF can exceed the dimension of the model space by an arbitrarily large amount.…”
Section: Discussionmentioning
confidence: 58%
“…Surprisingly, monotonicity can even break down for methods projecting onto convex sets, including ridge regression and the Lasso (although the DF cannot exceed the dimension of the convex set). The non-monotonicity of DF for such convex methods was discovered independently by Kaufman and Rosset (2014), who give a thorough account. Among other results, they prove that the degrees of freedom of projection onto any convex set must always be smaller than the dimension of that set.…”
Section: "Effective" or "Equivalent" Degrees Of Freedommentioning
confidence: 99%
“…In the recent papers Kaufman and Rosset (2014) and Janson et al (2015) The paper is organized as follows. In Section 2 we provide some basic results on the divergence of projection estimators.…”
Section: Introductionmentioning
confidence: 99%