2020
DOI: 10.48550/arxiv.2003.05508
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth

Abstract: Training deep neural networks with stochastic gradient descent (SGD) can often achieve zero training loss on real-world tasks although the optimization landscape is known to be highly non-convex. To understand the success of SGD for training deep neural networks, this work presents a mean-field analysis of deep residual networks, based on a line of works that interpret the continuum limit of the deep residual network as an ordinary differential equation when the network capacity tends to infinity. Specifically… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(18 citation statements)
references
References 46 publications
0
18
0
Order By: Relevance
“…As a remark, our approach can also be applied to prove a similar global convergence guarantee for two-layer networks, removing the convex loss assumption in previous works (Nguyen & Pham (2020)). The recent work Lu et al (2020) on a MF resnet model (a composition of many two-layer MF networks) and a recent update of Sirignano & Spiliopoulos (2019) essentially establish conditions of stationary points to be global optima. They however require strong assumptions on the support of the limit point.…”
Section: Discussionmentioning
confidence: 99%
“…As a remark, our approach can also be applied to prove a similar global convergence guarantee for two-layer networks, removing the convex loss assumption in previous works (Nguyen & Pham (2020)). The recent work Lu et al (2020) on a MF resnet model (a composition of many two-layer MF networks) and a recent update of Sirignano & Spiliopoulos (2019) essentially establish conditions of stationary points to be global optima. They however require strong assumptions on the support of the limit point.…”
Section: Discussionmentioning
confidence: 99%
“…We then generate 500 training points followed by 100 test points with time step h = 0.01. That means the solution to this equation during the time interval [0, 5] is treated as the training set, and then we learn the data using a PNN and predict the solution between [5,6]. The result is shown in Fig.…”
Section: Nonlinear Schrödinger Equationmentioning
confidence: 99%
“…T HE connection between dynamical systems and neural network models has been widely studied in the literature, see, for example, [1]- [5]. In general, neural networks can be considered as discrete dynamical systems with the basic dynamics at each step being a linear transformation followed by a component-wise nonlinear (activation) function.…”
Section: Introductionmentioning
confidence: 99%
“…There is a rising interest in modeling the forward propagation of the deep neural networks using a controlled dynamics and in connecting the deep learning to the optimal control problems, see e.g. [5,6,9,17,20,21]. For the controlled processes in the particular form (4.1), we refer to Section 4 in [15] for the connection between the optimal control problem and the deep neural networks.…”
Section: 3mentioning
confidence: 99%