2021
DOI: 10.48550/arxiv.2110.03922
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Theory of the Inductive Bias and Generalization of Kernel Regression and Wide Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
11
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(11 citation statements)
references
References 0 publications
0
11
0
Order By: Relevance
“…4B), generalization performance depends on values of the kernel function evaluated across this entire range of overlaps. In particular, methods from the theory of kernel regression (Sollich, 1998; Jacot et al, 2018; Bordelon et al, 2020; Canatar et al, 2021b; Simon et al, 2021) quantify a network’s performance on a learning task by decomposing the target function into a set of basis functions (Fig. 4D).…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…4B), generalization performance depends on values of the kernel function evaluated across this entire range of overlaps. In particular, methods from the theory of kernel regression (Sollich, 1998; Jacot et al, 2018; Bordelon et al, 2020; Canatar et al, 2021b; Simon et al, 2021) quantify a network’s performance on a learning task by decomposing the target function into a set of basis functions (Fig. 4D).…”
Section: Resultsmentioning
confidence: 99%
“…where C 1 and C 2 do not depend on α (Canatar et al, 2021b;Simon et al, 2021; see Methods). Equation ( 4) illustrates that for equal values of c α , modes with greater λ α contribute less to the generalization error.…”
Section: Geometry Of the Expansion Layer Representationmentioning
confidence: 99%
See 2 more Smart Citations
“…We qualify the performance of kernels based on the theory of the kernel method [17] and the neural tangent kernel theory for linear models [15,[30][31][32][33][34][35][36][37]. Better kernels will have flatter eigenspectra with more non-trivial kernel eigenvalues, which will lead to faster convergence speed [37] (see discussions in SM) and less generalization error for good enough alignments [38][39][40][41][42]. These features are observable through the numerical results in the kernel trick and the gradient descent dynamics.…”
mentioning
confidence: 99%