This paper deals with neural networks as dynamical systems governed by finite difference equations. It shows that the introduction of k-many skip connections into network architectures, such as residual networks and additive dense networks, define k th order dynamical equations on the layer-wise transformations. Closedform solutions for the state space representations of general k th order additive dense networks, where the concatenation operation is replaced by addition, as well as k th order smooth networks, are found. The developed provision endows deep neural networks with an algebraic structure. Furthermore, it is shown that imposing k th order smoothness on network architectures with d-many nodes per layer increases the state space dimension by a multiple of k, and so the effective embedding dimension of the data manifold by the neural network is k · d-many dimensions. It follows that network architectures of these types reduce the number of parameters needed to maintain the same embedding dimension by a factor of k 2 when compared to an equivalent first-order, residual network. Numerical simulations and experiments on CIFAR10, SVHN, and MNIST have been conducted to help understand the developed theory and efficacy of the proposed concepts. universal approximator [9], where it is learning something similar to a piecewise linear finite-mesh approximation of the data manifold.Recent work consistent with the original intuition of learning perturbations from the identity has shown that residual networks, with their first-order perturbation term, can be formulated as a finite difference approximation of a first-order differential equation [5]. This has the interesting consequence that residual networks are C 1 smooth dynamic equations through the layers of the network. Additionally, one may then define entire classes of C k differentiable transformations over the layers, and then induce network architectures from their finite difference approximations.Work by Chang et al.[3] considered residual neural networks as forward difference approximations to C 1 transformations as well. This work has been extended to develop new network architectures by using central differencing, as opposed to forward differencing, to approximate the set of coupled first order differential equations, called the Midpoint Network [2]. Similarly, other researchers have used different numerical schemes to approximate the first order ordinary differential equations, such as the linear multistep method to develop the Linear Multistep-architecture [10]. This is different from the previous work [5] where entire classes of finite differencing approximations to k th order differential equations are defined. Haber and Ruthutto [4] considered how stability techniques from finite difference methods can be applied to improve first and second order smooth neural networks. For example, they suggest requiring that the real part of the eigenvalues from the Jacobian transformations be approximately equal to zero. This ensures that little information about...