Low-cost air quality sensors are promising supplements to regulatory monitors for PM2.5 exposure assessment. However, little has been done to incorporate the low-cost sensor measurements in large-scale PM2.5 exposure modeling. We conducted spatially varying calibration and developed a downweighting strategy to optimize the use of low-cost sensor data in PM2.5 estimation. In California, PurpleAir low-cost sensors were paired with air quality system (AQS) regulatory stations, and calibration of the sensors was performed by geographically weighted regression. The calibrated PurpleAir measurements were then given lower weights according to their residual errors and fused with AQS measurements into a random forest model to generate 1 km daily PM2.5 estimates. The calibration reduced PurpleAir’s systematic bias to ∼0 μg/m3 and residual errors by 36%. Increased sensor bias was found to be associated with higher temperature and humidity, as well as longer operating time. The weighted prediction model outperformed the AQS-based prediction model with an improved random cross-validation (CV) R 2 of 0.86, an improved spatial CV R 2 of 0.81, and a lower prediction error. The temporal CV R 2 did not improve due to the temporal discontinuity of PurpleAir. The inclusion of PurpleAir data allowed the predictions to better reflect PM2.5 spatial details and hotspots.
Satellite aerosol optical depth (AOD) has been widely employed to evaluate ground fine particle (PM 2.5 ) levels, whereas snow/cloud covers often lead to a large proportion of non-random missing AOD values. As a result, the fully covered and unbiased PM 2.5 estimates will be hard to generate. Among the current approaches to deal with the data gap issue, few have considered the cloud-AOD relationship and none of them have considered the snow-AOD relationship. This study examined the impacts of snow and cloud covers on AOD and PM 2.5 and made full-coverage PM 2.5 predictions by considering these impacts. To estimate missing AOD values, daily gap-filling models with snow/cloud fractions and meteorological covariates were developed using the random forest algorithm. By using these models in New York State, a daily AOD data set with a 1-km resolution was generated with a complete coverage. The "out-of-bag" R 2 of the gap-filling models averaged 0.93 with an interquartile range from 0.90 to 0.95. Subsequently, a random forest-based PM 2.5 prediction model with the gap-filled AOD and covariates was built to predict fully covered PM 2.5 estimates. A ten-fold cross-validation for the prediction model showed a good performance with an R 2 of 0.82. In the gap-filling models, the snow fraction was of higher significance to the snow season compared with the rest of the year. The prediction models fitted with/without the snow fraction also suggested the discernible changes in PM 2.5 patterns, further confirming the significance of this parameter. Compared with the methods without considering snow and cloud covers, our PM 2.5 prediction surfaces showed more spatial details and reflected small-scale terrain-driven PM 2.5 patterns. The proposed methods can be generalized to the areas with extensive snow/cloud covers and large proportions of missing satellite AOD data for predicting PM 2.5 levels with high resolutions and complete coverage.
It is well recognized that exposure to fine particulate matter (PM2.5) affects health adversely, yet few studies from South America have documented such associations due to the sparsity of PM2.5 measurements. Lima’s topography and aging vehicular fleet results in severe air pollution with limited amounts of monitors to effectively quantify PM2.5 levels for epidemiologic studies. We developed an advanced machine learning model to estimate daily PM2.5 concentrations at a 1 km2 spatial resolution in Lima, Peru from 2010 to 2016. We combined aerosol optical depth (AOD), meteorological fields from the European Centre for Medium-Range Weather Forecasts (ECMWF), parameters from the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem), and land use variables to fit a random forest model against ground measurements from 16 monitoring stations. Overall cross-validation R2 (and root mean square prediction error, RMSE) for the random forest model was 0.70 (5.97 μg/m3). Mean PM2.5 for ground measurements was 24.7 μg/m3 while mean estimated PM2.5 was 24.9 μg/m3 in the cross-validation dataset. The mean difference between ground and predicted measurements was −0.09 μg/m3 (Std.Dev. = 5.97 μg/m3), with 94.5% of observations falling within 2 standard deviations of the difference indicating good agreement between ground measurements and predicted estimates. Surface downwards solar radiation, temperature, relative humidity, and AOD were the most important predictors, while percent urbanization, albedo, and cloud fraction were the least important predictors. Comparison of monthly mean measurements between ground and predicted PM2.5 shows good precision and accuracy from our model. Furthermore, mean annual maps of PM2.5 show consistent lower concentrations in the coast and higher concentrations in the mountains, resulting from prevailing coastal winds blown from the Pacific Ocean in the west. Our model allows for construction of long-term historical daily PM2.5 measurements at 1 km2 spatial resolution to support future epidemiological studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.