A number of models have been developed to estimate PM2.5 exposure, including satellite-based aerosol optical depth (AOD) models, land-use regression or chemical transport model simulation, all with both strengths and weaknesses. Variables like normalized difference vegetation index (NDVI), surface reflectance, absorbing aerosol index and meteoroidal fields, are also informative about PM2.5 concentrations. Our objective is to establish a hybrid model which incorporates multiple approaches and input variables to improve model performance. To account for complex atmospheric mechanisms, we used a neural network for its capacity to model nonlinearity and interactions. We used convolutional layers, which aggregate neighboring information, into a neural network to account for spatial and temporal autocorrelation. We trained the neural network for the continental United States from 2000 to 2012 and tested it with left out monitors. Ten-fold cross-validation revealed a good model performance with total R2 of 0.84 on the left out monitors. Regional R2 could be even higher for the Eastern and Central United States. Model performance was still good at low PM2.5 concentrations. Then, we used the trained neural network to make daily prediction of PM2.5 at 1 km×1 km grid cells. This model allows epidemiologists to access PM2.5 exposure in both the short-term and the long-term.
Background
The use of satellite-based aerosol optical depth (AOD) to estimate fine particulate matter (PM2.5) for epidemiology studies has increased substantially over the past few years. These recent studies often report moderate predictive power, which can generate downward bias in effect estimates. In addition, AOD measurements have only moderate spatial resolution, and have substantial missing data.
Methods
We make use of recent advances in MODIS satellite data processing algorithms (Multi-Angle Implementation of Atmospheric Correction (MAIAC), which allow us to use 1 km (versus currently available 10 km) resolution AOD data. We developed and cross validated models to predict daily PM2.5 at a 1×1km resolution across the northeastern USA (New England, New York and New Jersey) for the years 2003–2011, allowing us to better differentiate daily and long term exposure between urban, suburban, and rural areas. Additionally, we developed an approach that allows us to generate daily high-resolution 200 m localized predictions representing deviations from the area 1×1 km grid predictions. We used mixed models regressing PM2.5 measurements against day-specific random intercepts, and fixed and random AOD and temperature slopes. We then use generalized additive mixed models with spatial smoothing to generate grid cell predictions when AOD was missing. Finally, to get 200 m localized predictions, we regressed the residuals from the final model for each monitor against the local spatial and temporal variables at each monitoring site.
Results
Our model performance was excellent (mean out-of-sample R2=0.88). The spatial and temporal components of the out-of-sample results also presented very good fits to the withheld data (R2=0.87, R2=0.87). In addition, our results revealed very little bias in the predicted concentrations (Slope of predictions versus withheld observations = 0.99).
Conclusion
Our daily model results show high predictive accuracy at high spatial resolutions and will be useful in reconstructing exposure histories for epidemiological studies across this region.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.