2018
DOI: 10.48550/arxiv.1812.00174
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Stochastic Training of Residual Networks: a Differential Equation Viewpoint

Abstract: During the last few years, significant attention has been paid to the stochastic training of artificial neural networks, which is known as an effective regularization approach that helps improve the generalization capability of trained models. In this work, the method of modified equations is applied to show that the residual network and its variants with noise injection can be regarded as weak approximations of stochastic differential equations. Such observations enable us to bridge the stochastic training pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 21 publications
0
10
0
Order By: Relevance
“…Clearly, the framework of the NODEs can be formulated as a typical problem of optimal control on ODEs. Additionally, the framework of NODEs has been generalized to the other dynamical systems, such as the Partial Differential Equations (PDEs) (Han et al, 2018;Long et al, 2018;Ruthotto & Haber, 2019;Sun et al, 2020) and the Stochastic Differential Equations (SDEs) Sun et al, 2018;, where the theory of optimal control has been completely established. It is worthwhile to mention that the optimal control theory is tightly connected with and benefits from the method of the classical calculus of variations (Liberzon, 2011).…”
Section: Related Workmentioning
confidence: 99%
“…Clearly, the framework of the NODEs can be formulated as a typical problem of optimal control on ODEs. Additionally, the framework of NODEs has been generalized to the other dynamical systems, such as the Partial Differential Equations (PDEs) (Han et al, 2018;Long et al, 2018;Ruthotto & Haber, 2019;Sun et al, 2020) and the Stochastic Differential Equations (SDEs) Sun et al, 2018;, where the theory of optimal control has been completely established. It is worthwhile to mention that the optimal control theory is tightly connected with and benefits from the method of the classical calculus of variations (Liberzon, 2011).…”
Section: Related Workmentioning
confidence: 99%
“…can indeed be seen as a discretized Euler scheme with unit step size of the ordinary differential equation (ODE) ẋi = T (x i ) (Weinan, 2017;Chen et al, 2018;Teh et al, 2019;Sun et al, 2018;Weinan et al, 2019;Lu et al, 2018;Ruthotto and Haber, 2019;Sander et al, 2021). In section 4, we adopt this point of view on residual attention layers in order to get a better theoretical understanding of attention mechanisms.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Accordingly, Novak et al [13] distinguished different network models based on their sensitivity and came up with a robustness measure through their input-output Jacobian. Alternatively, several researchers have connected neural networks to dynamical models to estimate the complexity from a theoretical perspective by establishing a parallel between the network architectures and stochastic partial differential equations [20], [21], [22]. A different approach, like Zhang et al [23], is focused on complexity estimation of recurrent neural networks by using the concept of cyclic graph to define recurrent depth or recurrent skip coefficient to capture how rapidly information propagates over time.…”
Section: Related Workmentioning
confidence: 99%