Reversible Architectures for Arbitrarily Deep Residual Neural Networks

Chang, Bo; Meng, Lili; Haber, Eldad; Ruthotto, Lars; Begert, David; Holtham, Elliot

doi:10.48550/arxiv.1709.03698

Cited by 17 publications

(31 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sonoda & Murata (2017) and Li & Shi (2017) also regarded ResNet as dynamic systems that are the characteristic lines of a transport equation on the distribution of the data set. Similar observations were also made by Chang et al (2017); they designed a reversible architecture to grant stability to the dynamic system. On the other hand, many deep network designs were inspired by optimization algorithms, such as the network LISTA (Gregor & LeCun, 2010) and the ADMM-Net (Yang et al, 2016).…”

Section: Related Worksupporting

confidence: 63%

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

Lu,

Zhong,

et al. 2017

Preprint

View full text Add to dashboard Cite

Deep neural networks have become the state-of-the-art models in numerous machine learning tasks. However, general guidance to network architecture design is still missing. In our work, we bridge deep neural network design with numerical differential equations. We show that many effective networks, such as ResNet, PolyNet, FractalNet and RevNet, can be interpreted as different numerical discretizations of differential equations. This finding brings us a brand new perspective on the design of effective deep architectures. We can take advantage of the rich knowledge in numerical analysis to guide us in designing new and potentially more effective deep networks. As an example, we propose a linear multi-step architecture (LM-architecture) which is inspired by the linear multistep method solving ordinary differential equations. The LM-architecture is an effective structure that can be used on any ResNet-like networks. In particular, we demonstrate that LM-ResNet and LM-ResNeXt (i.e. the networks obtained by applying the LM-architecture on ResNet and ResNeXt respectively) can achieve noticeably higher accuracy than ResNet and ResNeXt on both CIFAR and ImageNet with comparable numbers of trainable parameters. In particular, on both CIFAR and ImageNet, LM-ResNet/LM-ResNeXt can significantly compress (> 50%) the original networks while maintaining a similar performance. This can be explained mathematically using the concept of modified equation from numerical analysis. Last but not least, we also establish a connection between stochastic control and noise injection in the training process which helps to improve generalization of the networks. Furthermore, by relating stochastic training strategy with stochastic dynamic system, we can easily apply stochastic training to the networks with the LM-architecture. As an example, we introduced stochastic depth to LM-ResNet and achieve significant improvement over the original LM-ResNet on CIFAR10.

show abstract

Section: Related Worksupporting

confidence: 63%

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

Lu,

Zhong,

et al. 2017

Preprint

View full text Add to dashboard Cite

show abstract

“…One can view a ghost element as pseudo element that lies outside the domain used to control the gradient. For example with a k = 2 architecture from Equation 4b, one needs the initial position and velocity in order to be able to define x (2) as a function of x (0) and x (1) . In the next subsection it will be shown that the dense network [7] can be interpreted as the interior/ghost elements needed to initialize the dynamical equation.…”

Section: Architectures Induced From Smooth Transformationsmentioning

confidence: 99%

“…Work by Chang et al [3] considered residual neural networks as forward difference approximations to C 1 transformations as well. This work has been extended to develop new network architectures by using central differencing, as opposed to forward differencing, to approximate the set of coupled first order differential equations, called the Midpoint Network [2]. Similarly, other researchers have used different numerical schemes to approximate the first order ordinary differential equations, such as the linear multistep method to develop the Linear Multistep-architecture [10].…”

Section: Introductionmentioning

confidence: 99%

State-Space Representations of Deep Neural Networks

et al. 2019

View full text Add to dashboard Cite

This paper deals with neural networks as dynamical systems governed by finite difference equations. It shows that the introduction of k-many skip connections into network architectures, such as residual networks and additive dense networks, define k th order dynamical equations on the layer-wise transformations. Closedform solutions for the state space representations of general k th order additive dense networks, where the concatenation operation is replaced by addition, as well as k th order smooth networks, are found. The developed provision endows deep neural networks with an algebraic structure. Furthermore, it is shown that imposing k th order smoothness on network architectures with d-many nodes per layer increases the state space dimension by a multiple of k, and so the effective embedding dimension of the data manifold by the neural network is k · d-many dimensions. It follows that network architectures of these types reduce the number of parameters needed to maintain the same embedding dimension by a factor of k 2 when compared to an equivalent first-order, residual network. Numerical simulations and experiments on CIFAR10, SVHN, and MNIST have been conducted to help understand the developed theory and efficacy of the proposed concepts. universal approximator [9], where it is learning something similar to a piecewise linear finite-mesh approximation of the data manifold.Recent work consistent with the original intuition of learning perturbations from the identity has shown that residual networks, with their first-order perturbation term, can be formulated as a finite difference approximation of a first-order differential equation [5]. This has the interesting consequence that residual networks are C 1 smooth dynamic equations through the layers of the network. Additionally, one may then define entire classes of C k differentiable transformations over the layers, and then induce network architectures from their finite difference approximations.Work by Chang et al.[3] considered residual neural networks as forward difference approximations to C 1 transformations as well. This work has been extended to develop new network architectures by using central differencing, as opposed to forward differencing, to approximate the set of coupled first order differential equations, called the Midpoint Network [2]. Similarly, other researchers have used different numerical schemes to approximate the first order ordinary differential equations, such as the linear multistep method to develop the Linear Multistep-architecture [10]. This is different from the previous work [5] where entire classes of finite differencing approximations to k th order differential equations are defined. Haber and Ruthutto [4] considered how stability techniques from finite difference methods can be applied to improve first and second order smooth neural networks. For example, they suggest requiring that the real part of the eigenvalues from the Jacobian transformations be approximately equal to zero. This ensures that little information about...

show abstract

“…T HE connection between dynamical systems and neural network models has been widely studied in the literature, see, for example, [1]- [5]. In general, neural networks can be considered as discrete dynamical systems with the basic dynamics at each step being a linear transformation followed by a component-wise nonlinear (activation) function.…”

Section: Introductionmentioning

confidence: 99%

“…In recent works [14]- [19], the primary focus has been to solve the inverse problem, i.e., identifying Hamiltonian systems from data, using structured neural networks. For example, HNNs [15] use a neural network H to approximate the Hamiltonian H in (1), then learn H by reformulating the loss function. Based on HNNs, other models were proposed to tackle problems in generative modeling [16], [19] and continuous control [20].…”

Section: Introductionmentioning

confidence: 99%

Learning Poisson Systems and Trajectories of Autonomous Systems via Poisson Neural Networks

Jin

Zhang

Kevrekidis

et al. 2023

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

We propose the Poisson neural networks (PNNs) to learn Poisson systems and trajectories of autonomous systems from data. Based on the Darboux-Lie theorem, the phase flow of a Poisson system can be written as the composition of (1) a coordinate transformation, (2) an extended symplectic map and (3) the inverse of the transformation. In this work, we extend this result to the unknotted trajectories of autonomous systems. We employ structured neural networks with physical priors to approximate the three aforementioned maps. We demonstrate through several simulations that PNNs are capable of handling very accurately several challenging tasks, including the motion of a particle in the electromagnetic potential, the nonlinear Schrödinger equation, and pixel observations of the two-body problem.

show abstract

Reversible Architectures for Arbitrarily Deep Residual Neural Networks

Cited by 17 publications

References 22 publications

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

State-Space Representations of Deep Neural Networks

Learning Poisson Systems and Trajectories of Autonomous Systems via Poisson Neural Networks

Contact Info

Product

Resources

About