It is by now well-known that practical deep supervised learning may roughly be cast as an optimal control problem for a specific discrete-time, nonlinear dynamical system called an artificial neural network. In this work, we consider the continuous-time formulation of the deep supervised learning problem, and study the latter's behavior when the final time horizon increases, a fact that can be interpreted as increasing the number of layers in the neural network setting.When considering the classical regularized empirical risk minimization problem, we show that, in long time, the optimal states converge to zero training error, namely approach the zero training error regime, whilst the optimal control parameters approach, on an appropriate scale, minimal norm parameters with corresponding states precisely in the zero training error regime. This result provides an alternative theoretical underpinning to the notion that neural networks learn best in the overparametrized regime, when seen from the large layer perspective.We also propose a learning problem consisting of minimizing a cost with a state tracking term, and establish the well-known turnpike property, which indicates that the solutions of the learning problem in long time intervals consist of three pieces, the first and the last of which being transient short-time arcs, and the middle piece being a long-time arc staying exponentially close to the optimal solution of an associated static learning problem. This property in fact stipulates a quantitative estimate for the number of layers required to reach the zero training error regime.Both of the aforementioned asymptotic regimes are addressed in the context of continuous-time and continuous space-time neural networks, the latter taking the form of nonlinear, integro-differential equations, hence covering residual neural networks with both fixed and possibly variable depths. Contents 1. Introduction 2 2. A roadmap to continuous-time supervised learning 8 3. Asymptotics without tracking 13 4. Asymptotics with tracking 25 5. The zero training error regime 47 Date: August 7, 2020.
In this work, we address the local controllability of a one-dimensional free boundary problem for a fluid governed by the viscous Burgers equation. The free boundary manifests itself as one moving end of the interval, and its evolution is given by the value of the fluid velocity at this endpoint. We prove that, by means of a control actuating along the fixed boundary, we may steer the fluid to constant velocity in addition to prescribing the free boundary's position, provided the initial velocities and interface positions are close enough.
We present a new proof of the turnpike property for nonlinear optimal control problems, when the running target is a steady control-state pair of the underlying system. Our strategy combines the construction of quasi-turnpike controls via controllability, and a bootstrap argument, and does not rely on analyzing the optimality system or linearization techniques. This in turn allows us to address several optimal control problems for finite-dimensional, control-affine systems with globally Lipschitz (possibly nonsmooth) nonlinearities, without any smallness conditions on the initial data or the running target. These results are motivated by applications in machine learning through deep residual neural networks, which may be fit within our setting. We show that our methodology is applicable to controlled PDEs as well, such as the semilinear wave and heat equation with a globally Lipschitz nonlinearity, once again without any smallness assumptions.
In this work, we investigate the null-controllability of a nonlinear degenerate parabolic equation, which is the equation satisfied by a perturbation around the self-similar solution of the porous medium equation in Lagrangian-like coordinates. We prove a local null-controllability result for a regularized version of the nonlinear problem, in which singular terms have been removed from the nonlinearity. We use spectral techniques and the source-term method to deal with the linearized problem and the conclusion follows by virtue of a Banach fixed-point argument. The spectral techniques are also used to prove a null-controllability result for the linearized thin-film equation, a degenerate fourth order analog of the problem under consideration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.