Deep neural networks have become invaluable tools for supervised machine learning, e.g., classification of text or images. While often offering superior results over traditional techniques and successfully expressing complicated patterns in data, deep architectures are known to be challenging to design and train such that they generalize well to new data. Critical issues with deep architectures are numerical instabilities in derivative-based learning algorithms commonly called exploding or vanishing gradients. In this paper, we propose new forward propagation techniques inspired by systems of Ordinary Differential Equations (ODE) that overcome this challenge and lead to well-posed learning problems for arbitrarily deep networks.The backbone of our approach is our interpretation of deep learning as a parameter estimation problem of nonlinear dynamical systems. Given this formulation, we analyze stability and well-posedness of deep learning and use this new understanding to develop new network architectures. We relate the exploding and vanishing gradient phenomenon to the stability of the discrete ODE and present several strategies for stabilizing deep learning for very deep networks. While our new architectures restrict the solution space, several numerical experiments show their competitiveness with state-of-the-art networks.
Partial differential equations (PDEs) are indispensable for modeling many physical phenomena and also commonly used for solving image processing tasks. In the latter area, PDE-based approaches interpret image data as discretizations of multivariate functions and the output of image processing algorithms as solutions to certain PDEs. Posing image processing problems in the infinite dimensional setting provides powerful tools for their analysis and solution. Over the last few decades, the reinterpretation of classical image processing problems through the PDE lens has been creating multiple celebrated approaches that benefit a vast area of tasks including image segmentation, denoising, registration, and reconstruction.In this paper, we establish a new PDE-interpretation of a class of deep convolutional neural networks (CNN) that are commonly used to learn from speech, image, and video data. Our interpretation includes convolution residual neural networks (ResNet), which are among the most promising approaches for tasks such as image classification having improved the state-of-the-art performance in prestigious benchmark challenges. Despite their recent successes, deep ResNets still face some critical challenges associated with their design, immense computational costs and memory requirements, and lack of understanding of their reasoning.Guided by well-established PDE theory, we derive three new ResNet architectures that fall into two new classes: parabolic and hyperbolic CNNs. We demonstrate how PDE theory can provide new insights and algorithms for deep learning and demonstrate the competitiveness of three new CNN architectures using numerical experiments.
This paper considers optimization techniques for the solution of nonlinear inverse problems where the forward problems, like those encountered in electromagnetics, are modelled by differential equations. Such problems are often solved by utilizing a Gauss-Newton method in which the forward model constraints are implicitly incorporated. Variants of Newton's method which use second-derivative information are rarely employed because their perceived disadvantage in computational cost per step offsets their potential benefits of faster convergence. In this paper we show that, by formulating the inversion as a constrained or unconstrained optimization problem, and by employing sparse matrix techniques, we can carry out variants of sequential quadratic programming and the full Newton iteration with only a modest additional cost. By working with the differential equation explicitly we are able to relate the constrained and the unconstrained formulations and discuss the advantages of each. To make the comparisons meaningful we adopt the same global optimization strategy for all inversions. As an illustration, we focus upon a 1D electromagnetic (EM) example simulating a magnetotelluric survey. This problem is sufficiently rich that it illuminates most of the computational complexities that are prevalent in multi-source inverse problems and we therefore describe its solution process in detail. The numerical results illustrate that variants of Newton's method which utilize second-derivative information can produce a solution in fewer iterations and, in some cases where the data contain significant noise, requiring fewer floating point operations than Gauss-Newton techniques. Although further research is required, we believe that the variants proposed here will have a significant impact on developing practical solutions to large-scale 3D EM inverse problems.
Abstract.A particular problem in image registration arises for multimodal images taken from different imaging devices and/or modalities. Starting in 1995, mutual information has shown to be a very successful distance measure for multi-modal image registration. However, mutual information has also a number of well-known drawbacks. Its main disadvantage is that it is known to be highly non-convex and has typically many local maxima.This observation motivate us to seek a different image similarity measure which is better suited for optimization but as well capable to handle multi-modal images. In this work we investigate an alternative distance measure which is based on normalized gradients and compare its performance to Mutual Information. We call the new distance measure Normalized Gradient Fields (NGF).
While an experimental design for well-posed inverse linear problems has been well studied, covering a vast range of well-established design criteria and optimization algorithms, its ill-posed counterpart is a rather new topic. The ill-posed nature of the problem entails the incorporation of regularization techniques. The consequent non-stochastic error introduced by regularization needs to be taken into account when choosing an experimental design criterion. We discuss different ways to define an optimal design that controls both an average total error of regularized estimates and a measure of the total cost of the design. We also introduce a numerical framework that efficiently implements such designs and natively allows for the solution of large-scale problems. To illustrate the possible applications of the methodology, we consider a borehole tomography example and a two-dimensional function recovery problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.