Generalization error rates in kernel regression: the crossover from the noiseless to noisy regime*

Cui, Hugo; Loureiro, Bruno; Krząkała, Florent; Zdeborová, Lenka

doi:10.1088/1742-5468/ac9829

Cited by 7 publications

(5 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Methods of statistical physics have traditionally been the tool of choice for obtaining closed-form formulae in this setting (Engel, von Guericke, and den Broeck 2012). Recent works have provided analytical expressions for the generalisation error of high-dimensional kernel regressions (Canatar, Bordelon, and Pehlevan 2021;Bordelon, Canatar, and Pehlevan 2020;Jacot et al 2020;Simon et al 2022;Cui et al 2022). In particular, (Jacot et al 2020) and (Simon et al 2022) rely on the spectral universality assumption, just as we do to estimate the coefficients in our formula.…”

Section: Related Workmentioning

confidence: 99%

Double-Descent Curves in Neural Networks: A New Perspective Using Gaussian Processes

El Harzli,

Cuenca Grau,

Valle-Pérez

et al. 2024

AAAI

View full text Add to dashboard Cite

Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent perturbation of the spectrum of the neural network Gaussian process (NNGP) kernel, thus establishing a novel connection between the NNGP literature and the random matrix theory literature in the context of neural networks. Our analytical expressions allow us to explore the generalisation behavior of the corresponding kernel and GP regression. Furthermore, they offer a new interpretation of double-descent in terms of the discrepancy between the width-dependent empirical kernel and the width-independent NNGP kernel.

show abstract

Section: Related Workmentioning

confidence: 99%

Double-Descent Curves in Neural Networks: A New Perspective Using Gaussian Processes

El Harzli,

Cuenca Grau,

Valle-Pérez

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…A few recent works, appearing after the completion of this manuscript, also investigate the scaling of test error in related settings. Cui et al ( 24 ) study the decay of test error with dataset size for kernel regression in a high-dimensional limit with Gaussian design. Maloney et al ( 25 ) examine further a teacher-student framework similar to ours, deriving joint scaling laws using techniques from random matrix theory.…”

Section: Scaling Laws For Deep Neural Networkmentioning

confidence: 99%

Explaining neural scaling laws

Bahri,

Dyer,

Kaplan

et al. 2024

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains the origins of and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained from the spectrum of certain kernels, and we present evidence that large width and large dataset resolution-limited scaling exponents are related by a duality. We exhibit all four scaling regimes in the controlled setting of large random feature and pretrained models and test the predictions empirically on a range of standard architectures and datasets. We also observe several empirical relationships between datasets and scaling exponents under modifications of task and architecture aspect ratio. Our work provides a taxonomy for classifying different scaling regimes, underscores that there can be different mechanisms driving improvements in loss, and lends insight into the microscopic origin and relationships between scaling exponents.

show abstract

“…It can then be shown, if our formulas apply, that κ(0) ∝ 1 n α . See [11] for a detailed analysis of the consequences of the ridge regression asymptotic equivalents when such assumptions are made.…”

Section: Isotropic Covariance Matricesmentioning

confidence: 99%

“…• We consider in Section 4 the ridge regression estimator and re-interpret the results of [14,31,11,36,5] using classical notions from non-parametric statistics, namely the degrees of freedom, a.k.a. effective dimensionality [38,8].…”

Section: Introductionmentioning

confidence: 99%

High-dimensional analysis of double descent for linear regression with random projections

Bach¹

2023

Preprint

View full text Add to dashboard Cite

We consider linear regression problems with a varying number of random projections, where we provably exhibit a double descent curve for a fixed prediction problem, with a high-dimensional analysis based on random matrix theory. We first consider the ridge regression estimator and review earlier results using classical notions from non-parametric statistics, namely degrees of freedom, also known as effective dimensionality. We then compute asymptotic equivalents of the generalization performance (in terms of squared bias and variance) of the minimum norm least-squares fit with random projections, providing simple expressions for the double descent phenomenon.

show abstract

Generalization error rates in kernel regression: the crossover from the noiseless to noisy regime*

Cited by 7 publications

References 42 publications

Double-Descent Curves in Neural Networks: A New Perspective Using Gaussian Processes

Double-Descent Curves in Neural Networks: A New Perspective Using Gaussian Processes

Explaining neural scaling laws

High-dimensional analysis of double descent for linear regression with random projections

Contact Info

Product

Resources

About