The Gaussian process is an indispensable tool for spatial data analysts. The onset of the “big data” era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online. Electronic Supplementary Material Supplementary materials for this article are available at 10.1007/s13253-018-00348-w.
Vecchia's approximate likelihood for Gaussian process parameters depends on how the observations are ordered, which has been cited as a deficiency. This article takes the alternative standpoint that the ordering can be tuned to sharpen the approximations. Indeed, the first part of the paper includes a systematic study of how ordering affects the accuracy of Vecchia's approximation. We demonstrate the surprising result that random orderings can give dramatically sharper approximations than default coordinate-based orderings. Additional ordering schemes are described and analyzed numerically, including orderings capable of improving on random orderings. The second contribution of this paper is a new automatic method for grouping calculations of components of the approximation. The grouping methods simultaneously improve approximation accuracy and reduce computational burden. In common settings, reordering combined with grouping reduces Kullback-Leibler divergence from the target model by more than a factor of 60 compared to ungrouped approximations with default ordering. The claims are supported by theory and numerical results with comparisons to other approximations, including tapered covariances and stochastic partial differential equations. Computational details are provided, including the use of the approximations for prediction and conditional simulation. An application to space-time satellite data is presented.
Gaussian processes (GPs) are commonly used as models for functions, time series, and spatial fields, but they are computationally infeasible for large datasets. Focusing on the typical setting of modeling data as a GP plus an additive noise term, we propose a generalization of the Vecchia (J. Roy. Statist. Soc. Ser. B 50 (1988) 297-312) approach as a framework for GP approximations. We show that our general Vecchia approach contains many popular existing GP approximations as special cases, allowing for comparisons among the different methods within a unified framework. Representing the models by directed acyclic graphs, we determine the sparsity of the matrices necessary for inference, which leads to new insights regarding the computational properties. Based on these results, we propose a novel sparse general Vecchia approximation, which ensures computational feasibility for large spatial datasets but can lead to considerable improvements in approximation accuracy over Vecchia's original approach. We provide several theoretical results and conduct numerical comparisons. We conclude with guidelines for the use of Vecchia approximations in spatial statistics.
Abstract. Herein, we present a description of the Mechanism of Intermediate complexity for Modelling Iron (MIMI v1.0). This iron processing module was developed for use within Earth system models and has been updated within a modal aerosol framework from the original implementation in a bulk aerosol model. MIMI simulates the emission and atmospheric processing of two main sources of iron in aerosol prior to deposition: mineral dust and combustion processes. Atmospheric dissolution of insoluble to soluble iron is parameterized by an acidic interstitial aerosol reaction and a separate in-cloud aerosol reaction scheme based on observations of enhanced aerosol iron solubility in the presence of oxalate. Updates include a more comprehensive treatment of combustion iron emissions, improvements to the iron dissolution scheme, and an improved physical dust mobilization scheme. An extensive dataset consisting predominantly of cruise-based observations was compiled to compare to the model. The annual mean modelled concentration of surface-level total iron compared well with observations but less so in the soluble fraction (iron solubility) for which observations are much more variable in space and time. Comparing model and observational data is sensitive to the definition of the average as well as the temporal and spatial range over which it is calculated. Through statistical analysis and examples, we show that a median or log-normal distribution is preferred when comparing with soluble iron observations. The iron solubility calculated at each model time step versus that calculated based on a ratio of the monthly mean values, which is routinely presented in aerosol studies and used in ocean biogeochemistry models, is on average globally one-third (34 %) higher. We redefined ocean deposition regions based on dominant iron emission sources and found that the daily variability in soluble iron simulated by MIMI was larger than that of previous model simulations. MIMI simulated a general increase in soluble iron deposition to Southern Hemisphere oceans by a factor of 2 to 4 compared with the previous version, which has implications for our understanding of the ocean biogeochemistry of these predominantly iron-limited ocean regions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.