2021
DOI: 10.48550/arxiv.2103.09177
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep learning: a statistical viewpoint

Abstract: The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(24 citation statements)
references
References 60 publications
0
24
0
Order By: Relevance
“…In this paper, we survey the emerging field of TOPML research with a principal focus on foundational principles developed in the past few years. Compared to other recent surveys (Bartlett et al, 2021;Belkin, 2021), we take a more elementary signal processing perspective to elucidate these principles. Formally, we define the TOPML research area as the sub-field of ML theory where 1. there is clear consideration of exact or near interpolation of training data 2. the learned model complexity is high with respect to the training dataset size.…”
Section: Contents Of This Papermentioning
confidence: 99%
See 1 more Smart Citation
“…In this paper, we survey the emerging field of TOPML research with a principal focus on foundational principles developed in the past few years. Compared to other recent surveys (Bartlett et al, 2021;Belkin, 2021), we take a more elementary signal processing perspective to elucidate these principles. Formally, we define the TOPML research area as the sub-field of ML theory where 1. there is clear consideration of exact or near interpolation of training data 2. the learned model complexity is high with respect to the training dataset size.…”
Section: Contents Of This Papermentioning
confidence: 99%
“…≥ λ p . We refer the reader to Bartlett et al (2021) for a detailed exposition of these results and their consequences. Here, we present the essence of these results in a popular model used in high-dimensional statistics, the spiked covariance model.…”
Section: When Does the Minimum 2 -Norm Solution Enable Harmless Inter...mentioning
confidence: 99%
“…These empirical mysteries inspire a recent flurry of activity towards understanding the generalization properties of various interpolators. A dominant fraction of recent efforts, however, concentrated on studying certain minimum 2 -norm interpolators, primarily in the context of linear and/or kernel regression (see, e.g., ; Mei and Montanari (2019); Hastie et al (2019); Belkin et al (2020); Bartlett et al (2020Bartlett et al ( , 2021 and the references therein). This was in part due to the existence of closed-form expressions for minimum 2 -norm interpolators, which are particularly handy when determining the statistical risk.…”
Section: Introductionmentioning
confidence: 99%
“…Obstacles in the theoretical foundation include the higher-order nonlinear structures due to the stacking of multiple layers and the excessive number of network parameters in state of the art networks. For some recent surveys, see [5,6].…”
Section: 𝑅( A) − 𝑅 𝑛 ( A)mentioning
confidence: 99%
“…For the filtration {F 𝑡 } 𝑡≥0 introduced before, batch gradient noise is defined as the A-dependent F 𝑡+1measurable random vector 𝑊 𝑡+1 (A) = √ 𝑚(∇𝑅 𝑛 (A) − ∇ 𝑅 𝑡 𝑛 (A)). This random variable measures the effect of the subsampling on the gradient and allows to rewrite (5) as…”
Section: Relation Betweenmentioning
confidence: 99%