Neural ODEs as the deep limit of ResNets with constant weights

Avelin, Benny; Nyström, Kaj

doi:10.1142/s0219530520400023

Cited by 18 publications

(17 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Later on, this dynamical approach has been greatly popularized in the machine learning community under the name of NeurODE by Chen et al [27], see also [52]. The formulation starts by re-interpreting the iteration (1.2) as a discrete-time Euler approximation [9] of the following dynamical system Ẋt = F(t, X t , θ t ) ,…”

Section: Neurodes and Stochastic Optimal Controlmentioning

confidence: 99%

A Measure Theoretical Approach to the Mean-field Maximum Principle for Training NeurODEs

Bonnet¹,

Cipriani²,

Fornasier³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper we consider a measure-theoretical formulation of the training of NeurODEs in the form of a mean-field optimal control with L 2 -regularization of the control. We derive first order optimality conditions for the NeurODE training problem in the form of a mean-field maximum principle, and show that it admits a unique control solution, which is Lipschitz continuous in time. As a consequence of this uniqueness property, the mean-field maximum principle also provides a strong quantitative generalization error for finite sample approximations. Our derivation of the mean-field maximum principle is much simpler than the ones currently available in the literature for mean-field optimal control problems, and is based on a generalized Lagrange multiplier theorem on convex sets of spaces of measures. The latter is also new, and can be considered as a result of independent interest.

show abstract

Section: Neurodes and Stochastic Optimal Controlmentioning

confidence: 99%

A Measure Theoretical Approach to the Mean-field Maximum Principle for Training NeurODEs

Bonnet¹,

Cipriani²,

Fornasier³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…(3) Mean field analysis of (stochastic) gradient descent for two-layer neural networks [16,45,54,61] and multi-layer fully connected networks [2,49,60]. (4) The function space work [24,25]. Also related are the work in [3,4,7,53,64].…”

Section: Introductionmentioning

confidence: 99%

“…(4) The function space work [24,25]. Also related are the work in [3,4,7,53,64]. The work presented here is a natural extension of these ideas.…”

Section: Introductionmentioning

confidence: 99%

Machine learning from a continuous viewpoint, I

Weinan

2020

Sci. China Math.

View full text Add to dashboard Cite

We present a continuous formulation of machine learning, as a problem in the calculus of variations and differential-integral equations, in the spirit of classical numerical analysis. We demonstrate that conventional machine learning models and algorithms, such as the random feature model, the two-layer neural network model and the residual neural network model, can all be recovered (in a scaled form) as particular discretizations of different continuous formulations. We also present examples of new models, such as the flow-based random feature model, and new algorithms, such as the smoothed particle method and spectral method, that arise naturally from this continuous formulation. We discuss how the issues of generalization error and implicit regularization can be studied under this framework.

show abstract

“…Since the initial proposal of residual networks [26], many works have studied them theoretically, observing that the forward pass of a residual network resembles the explicit Euler scheme of an ordinary differential equation [30][31][32]. The question of stability, invertibility and reusability of the convolutional filters became central [33,34].…”

Section: Recurrent Residual Network As Odementioning

confidence: 99%

ThriftyNets: Convolutional Neural Networks with Tiny Parameter Budget

Coiffier

Hacene

Gripon

2021

IoT

View full text Add to dashboard Cite

Deep Neural Networks are state-of-the-art in a large number of challenges in machine learning. However, to reach the best performance they require a huge pool of parameters. Indeed, typical deep convolutional architectures present an increasing number of feature maps as we go deeper in the network, whereas spatial resolution of inputs is decreased through downsampling operations. This means that most of the parameters lay in the final layers, while a large portion of the computations are performed by a small fraction of the total parameters in the first layers. In an effort to use every parameter of a network at its maximum, we propose a new convolutional neural network architecture, called ThriftyNet. In ThriftyNet, only one convolutional layer is defined and used recursively, leading to a maximal parameter factorization. In complement, normalization, non-linearities, downsamplings and shortcut ensure sufficient expressivity of the model. ThriftyNet achieves competitive performance on a tiny parameters budget, exceeding 91% accuracy on CIFAR-10 with less than 40 k parameters in total, 74.3% on CIFAR-100 with less than 600 k parameters, and 67.1% On ImageNet ILSVRC 2012 with no more than 4.15 M parameters. However, the proposed method typically requires more computations than existing counterparts.

show abstract

Neural ODEs as the deep limit of ResNets with constant weights

Cited by 18 publications

References 21 publications

A Measure Theoretical Approach to the Mean-field Maximum Principle for Training NeurODEs

A Measure Theoretical Approach to the Mean-field Maximum Principle for Training NeurODEs

Machine learning from a continuous viewpoint, I

ThriftyNets: Convolutional Neural Networks with Tiny Parameter Budget

Contact Info

Product

Resources

About