2022
DOI: 10.48550/arxiv.2202.06526
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Benign Overfitting in Two-layer Convolutional Neural Networks

Abstract: Modern neural networks often have great expressive power and can be trained to overfit the training data, while still achieving a good test performance. This phenomenon is referred to as "benign overfitting". Recently, there emerges a line of works studying "benign overfitting" from the theoretical perspective. However, they are limited to linear models or kernel/random feature models, and there is still a lack of theoretical understanding about when and how benign overfitting occurs in neural networks. In thi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…We point out that as many previous works(Allen-Zhu and Li, 2020; Zou et al, 2021;Cao et al, 2022), polynomial ReLU activation can help us simplify the analysis of gradient descent, because polynomial ReLU activation can give a much larger separation of signal and noise (thus, cleaner analysis) than ReLU. Our analysis can be generalized to ReLU activation by using the arguments in (Allen-Zhu and Li, 2022).…”
mentioning
confidence: 67%
See 3 more Smart Citations
“…We point out that as many previous works(Allen-Zhu and Li, 2020; Zou et al, 2021;Cao et al, 2022), polynomial ReLU activation can help us simplify the analysis of gradient descent, because polynomial ReLU activation can give a much larger separation of signal and noise (thus, cleaner analysis) than ReLU. Our analysis can be generalized to ReLU activation by using the arguments in (Allen-Zhu and Li, 2022).…”
mentioning
confidence: 67%
“…In order to go beyond NTK regime, one line of research has focused on the mean field limit (Song et al, 2018;Chizat and Bach, 2018;Rotskoff and Vanden-Eijnden, 2018;Wei et al, 2019;Chen et al, 2020a;Sirignano and Spiliopoulos, 2020;Fang et al, 2021). Recently, people have started to study the neural network training dynamics in the feature learning regime where data from different class is defined by a set of class-related signals which are low rank Li, 2020, 2022;Cao et al, 2022;Shi et al, 2021;Telgarsky, 2022). However, all previous works did not consider the effect of pruning.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…A series of studies have proved the convergence (Jacot et al, 2018;Li and Liang, 2018;Du et al, 2019;Allen-Zhu et al, 2019b;Zou et al, 2018) and generalization (Allen-Zhu et al, 2019a;Arora et al, 2019a,b;Cao and Gu, 2019) guarantees in the so-called "neural tangent kernel" (NTK) regime, where the parameters stay close to the initialization, and the neural network function is approximately linear in its parameters. A recent line of works (Allen-Zhu and Li, 2019;Bai and Lee, 2019;Allen-Zhu and Li, 2020a,b,c;Li et al, 2020;Cao et al, 2022;Zou et al, 2021;Wen and Li, 2021) studied the learning dynamic of neural networks beyond the NTK regime. It is worthwhile to mention that our analysis of the MoE model is also beyond the NTK regime.…”
Section: Related Workmentioning
confidence: 99%