SummaryWith scientific data available at geocoded locations, investigators are increasingly turning to spatial process models for carrying out statistical inference. Over the last decade, hierarchical models implemented through Markov chain Monte Carlo methods have become especially popular for spatial modelling, given their flexibility and power to fit models that would be infeasible with classical methods as well as their avoidance of possibly inappropriate asymptotics. However, fitting hierarchical spatial models often involves expensive matrix decompositions whose computational complexity increases in cubic order with the number of spatial locations, rendering such models infeasible for large spatial data sets. This computational burden is exacerbated in multivariate settings with several spatially dependent response variables. It is also aggravated when data are collected at frequent time points and spatiotemporal process models are used. With regard to this challenge, our contribution is to work with what we call predictive process models for spatial and spatiotemporal data. Every spatial (or spatiotemporal) process induces a predictive process model (in fact, arbitrarily many of them). The latter models project process realizations of the former to a lower dimensional subspace, thereby reducing the computational burden. Hence, we achieve the flexibility to accommodate non-stationary, non-Gaussian, possibly multivariate, possibly spatiotemporal processes in the context of large data sets. We discuss attractive theoretical properties of these predictive processes. We also provide a computational template encompassing these diverse settings. Finally, we illustrate the approach with simulated and real data sets.
Advances in Geographical Information Systems (GIS) and Global Positioning Systems (GPS) enable accurate geocoding of locations where scientific data are collected. This has encouraged collection of large spatial datasets in many fields and has generated considerable interest in statistical modeling for location-referenced spatial data. The setting where the number of locations yielding observations is too large to fit the desired hierarchical spatial random effects models using Markov chain Monte Carlo methods is considered. This problem is exacerbated in spatial-temporal and multivariate settings where many observations occur at each location. The recently proposed predictive process, motivated by kriging ideas, aims to maintain the richness of desired hierarchical spatial modeling specifications in the presence of large datasets. A shortcoming of the original formulation of the predictive process is that it induces a positive bias in the non-spatial error term of the models. A modified predictive process is proposed to address this problem. The predictive process approach is knot-based leading to questions regarding knot design. An algorithm is designed to achieve approximately optimal spatial placement of knots. Detailed illustrations of the modified predictive process using multivariate spatial regression with both a simulated and a real dataset are offered.
Gaussian process models have been widely used in spatial statistics but face tremendous computational challenges for very large data sets. The model fitting and spatial prediction of such models typically require O.n 3 / operations for a data set of size n. Various approximations of the covariance functions have been introduced to reduce the computational cost. However, most existing approximations cannot simultaneously capture both the large-and the small-scale spatial dependence. A new approximation scheme is developed to provide a high quality approximation to the covariance function at both the large and the small spatial scales. The new approximation is the summation of two parts: a reduced rank covariance and a compactly supported covariance obtained by tapering the covariance of the residual of the reduced rank approximation. Whereas the former part mainly captures the large-scale spatial variation, the latter part captures the small-scale, local variation that is unexplained by the former part. By combining the reduced rank representation and sparse matrix techniques, our approach allows for efficient computation for maximum likelihood estimation, spatial prediction and Bayesian inference. We illustrate the new approach with simulated and real data sets.
Many critical ecological issues require the analysis of large spatial point data sets - for example, modelling species distributions, abundance and spread from survey data. But modelling spatial relationships, especially in large point data sets, presents major computational challenges. We use a novel Bayesian hierarchical statistical approach, 'spatial predictive process' modelling, to predict the distribution of a major invasive plant species, Celastrus orbiculatus, in the northeastern USA. The model runs orders of magnitude faster than traditional geostatistical models on a large data set of c. 4000 points, and performs better than generalized linear models, generalized additive models and geographically weighted regression in cross-validation. We also use this approach to model simultaneously the distributions of a set of four major invasive species in a spatially explicit multivariate model. This multispecies analysis demonstrates that some pairs of species exhibit negative residual spatial covariation, suggesting potential competitive interaction or divergent responses to unmeasured factors.
We propose a hierarchical modeling approach for explaining a collection of spatially referenced time series of extreme values. We assume that the observations follow generalized extreme value (GEV) distributions whose locations and scales are jointly spatially dependent where the dependence is captured using multivariate Markov random field models specified through coregionalization. In addition, there is temporal dependence in the locations. There are various ways to provide appropriate specifications; we consider four choices. The models can be fitted using a Markov Chain Monte Carlo (MCMC) algorithm to enable inference for parameters and to provide spatio-temporal predictions. We fit the models to a set of gridded interpolated precipitation data collected over a 50-year period for the Cape Floristic Region in South Africa, summarizing results for what appears to be the best choice of model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.