Koopman operator theory, a powerful framework for discovering the underlying dynamics of nonlinear dynamical systems, was recently shown to be intimately connected with neural network training. In this work, we take the first steps in making use of this connection. As Koopman operator theory is a linear theory, a successful implementation of it in evolving network weights and biases offers the promise of accelerated training, especially in the context of deep networks, where optimization is inherently a non-convex problem. We show that Koopman operator theory methods allow for accurate predictions of the weights and biases of a feedforward, fully connected deep network over a non-trivial range of training time. During this time window, we find that our approach is at least 10x faster than gradient descent based methods, in line with the results expected from our complexity analysis. We highlight additional methods by which our results can be expanded to broader classes of networks and larger time intervals, which shall be the focus of future work in this novel intersection between dynamical systems and neural network theory.
IntroductionDespite their black box nature, the training of artificial neural networks (NNs) is a discrete dynamical system. During training, NN weights evolve along a trajectory in an abstract weight space, the path determined by the implemented learning algorithm, the data used for training, and the network architecture. This dynamical systems picture is familiar, as many introductions to learning algorithms, such as gradient descent (GD), attempt to visualize training as a process whereby weights are changed iteratively under the influence of the loss landscape. Yet, while dynamical systems theory has provided insight into the behavior of many complex systems, its application to NNs has been limited.Recent advances in Koopman operator theory (KOT) have made it a powerful tool in studying the underlying dynamics of nonlinear systems in a data-driven manner [1][2][3][4][5][6][7][8][9]. This begs the question, can KOT be used to learn and predict the dynamics present in NN training? If so, can such an approach, which we call Koopman training, afford us benefits that traditional NN training methods cannot? * The authors contributed equally Preprint. Under review.