Abstract. Predictions from process-based models of environmental systems are biased, due to uncertainties in their inputs and parameterisations, reducing their utility. We develop a predictor for the bias in tropospheric ozone (a key pollutant) calculated by an atmospheric chemistry transport model (GEOS-Chem), based on outputs from the model and observations of ozone from both the surface (EPA, EMEP and GAW) and the ozone-sonde networks. We train a gradient-boosted decision tree algorithm (XGBoost) to predict model bias, with model and observational data for 2010–2015, and then test the approach using the years 2016–2017. We show that the bias-corrected model performs significantly better than the uncorrected model. The root mean square error is reduced from from 16.21 ppb to 7.48 ppb, the normalised mean bias is reduced from 0.28 to −0.04, and the Pearson's R is increased from 0.479 to 0.841. Comparisons with observations from the NASA ATom flights (which were not included in the training) also show improvements but to a smaller extent reducing the RMSE from 12.11 ppb to 10.50 ppb, the NMB from 0.08 to 0.06 and increasing the Pearson's R from 0.761 to 0.792. We attribute the smaller improvements to the lack of routine observational constraints of the remote troposphere. We explore the choice of predictor (bias prediction versus direct prediction) and conclude both may have utility. We show that the method is robust to variations in the volume of training data, with approximately a year of data needed to produce useful performance. Data denial experiments (removing observational sites from the algorithm training) shows that information from one location (for example Europe) can reduce the model bias over other locations (for example North America) which might provide insights into the processes controlling the model bias. We conclude that combining machine learning approaches with process based models may provide a useful tool for improving performance of air quality forecasts or to provide enhanced assessments of the impact of pollutants on human and ecosystem health, and may have utility in other environmental applications.
Abstract. Low cost sensors (LCS) are an appealing solution to the problem of spatial resolution in air quality measurement, but they currently do not have the same analytical performance as regulatory reference methods. Individual sensors can be susceptible to analytical cross interferences, have random signal variability and experience drift over short, medium and long timescales. To overcome some of the performance limitations of individual sensors we use a clustering approach using the instantaneous median signal from six identical electrochemical sensors to minimise the randomised drifts and inter-sensor differences. We report here a low power analytical device (
Diurnal plots of inorganic and organic species are shown in Figure 2 of the main manuscript for 'typical' chemistry days, i.e. where ozone increases through the morning to an afternoon peak of > 70 ppb. This accounts for 25 of the total 34 days for which ozone measurements are available. The days removed from the analysis were 22/5,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.