Natural spatiotemporal processes can be highly nonstationary in many ways, e.g. the low-level non-stationarity such as spatial correlations or temporal dependencies of local pixel values; and the high-level variations such as the accumulation, deformation or dissipation of radar echoes in precipitation forecasting. From Cramér's Decomposition [4], any non-stationary process can be decomposed into deterministic, time-variant polynomials, plus a zero-mean stochastic term. By applying differencing operations appropriately, we may turn time-variant polynomials into a constant, making the deterministic component predictable.However, most previous recurrent neural networks for spatiotemporal prediction do not use the differential signals effectively, and their relatively simple state transition functions prevent them from learning too complicated variations in spacetime. We propose the Memory In Memory (MIM) networks and corresponding recurrent blocks for this purpose. The MIM blocks exploit the differential signals between adjacent recurrent states to model the non-stationary and approximately stationary properties in spatiotemporal dynamics with two cascaded, self-renewed memory modules. By stacking multiple MIM blocks, we could potentially handle higher-order non-stationarity. The MIM networks achieve the state-of-the-art results on four spatiotemporal prediction tasks across both synthetic and real-world datasets. We believe that the general idea of this work can be potentially applied to other time-series forecasting tasks.
We propose an infinite-dimensional adjoint-based inexact Gauss-Newton method for the solution of inverse problems governed by Stokes models of ice sheet flow with nonlinear rheology and sliding law. The method is applied to infer the basal sliding coefficient and the rheological exponent parameter fields from surface velocities. The inverse problem is formulated as a nonlinear least-squares optimization problem whose cost functional is the misfit between surface velocity observations and model predictions. A Tikhonov regularization term is added to the cost functional to render the problem well-posed and account for observational error. Our findings show that the inexact Newton method is significantly more efficient than the nonlinear conjugate gradient method and that the number of Stokes solutions required to solve the inverse problem is insensitive to the number of inversion parameters. The results also show that the reconstructions of the basal sliding coefficient converge to the exact sliding coefficient as the observation error (here, the noise added to synthetic observations) decreases, and that a nonlinear rheology makes the reconstruction of the basal sliding coefficient more difficult. For the inversion of the rheology exponent field, we find that horizontally constant or smoothly varying parameter fields can be reconstructed satisfactorily from noisy observations.
The recent popularity of deep neural networks (DNNs) has generated a lot of research interest in performing DNN-related computation efficiently. However, the primary focus is usually very narrow and limited to (i) inference -i.e. how to efficiently execute already trained models and (ii) image classification networks as the primary benchmark for evaluation.Our primary goal in this work is to break this myopic view by (i) proposing a new benchmark for DNN training, called TBD 1 , that uses a representative set of DNN models that cover a wide range of machine learning applications: image classification, machine translation, speech recognition, object detection, adversarial networks, reinforcement learning, and (ii) by performing an extensive performance analysis of training these different applications on three major deep learning frameworks (TensorFlow, MXNet, CNTK) across different hardware configurations (single-GPU, multi-GPU, and multi-machine). TBD currently covers six major application domains and eight different state-of-the-art models. We present a new toolchain for performance analysis for these models that combines the targeted usage of existing performance analysis tools, careful selection of new and existing metrics and methodologies to analyze the results, and utilization of domain specific characteristics of DNN training. We also build a new set of tools for memory profiling in all three major frameworks; much needed tools that can finally shed some light on precisely how much memory is consumed by different data structures (weights, activations, gradients, workspace) in DNN training. By using our tools and methodologies, we make several important observations and recommendations on where the future research and optimization of DNN training should be focused.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.