Global, high-resolution mapping of tropospheric ozone – explainable machine learning and impact of uncertainties

Betancourt, Clara; Stomberg, Timo T.; Edrich, Ann-Kathrin; Patnala, Ankit; Schultz, Martin; Roscher, Ribana; Kowalski, Julia; Stadtler, Scarlet

doi:10.5194/gmd-15-4331-2022

Cited by 19 publications

(27 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We chose machine learning as an alternative method to propose new station locations, which is a task that is also tackled by using an atmospheric chemistry model [42]. Although we show that the number of underrepresented test samples is not a significant issue for the prediction on the test dataset, underrepresented locations become problematic in the case of applying the models to areas outside the AQ-Bench dataset, e.g., in (global) mapping studies [13,41,43].…”

Section: Discussionmentioning

confidence: 99%

“…Ref. [13] showed that forward feature selection applied on AQ-Bench leads to 31 features. The data split is kept as in AQ-Bench with 60% training (approximately 3300 samples) and 20% validation and test samples (roughly 1110 samples, respectively).…”

Section: Model Trainingmentioning

confidence: 99%

“…What is rarely done, to our knowledge, is to explain the differences between various machine learning architectures applied to the same task. [13].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset

Stadtler

Betancourt

Roscher

2022

MAKE

Self Cite

View full text Add to dashboard Cite

Air quality is relevant to society because it poses environmental risks to humans and nature. We use explainable machine learning in air quality research by analyzing model predictions in relation to the underlying training data. The data originate from worldwide ozone observations, paired with geospatial data. We use two different architectures: a neural network and a random forest trained on various geospatial data to predict multi-year averages of the air pollutant ozone. To understand how both models function, we explain how they represent the training data and derive their predictions. By focusing on inaccurate predictions and explaining why these predictions fail, we can (i) identify underrepresented samples, (ii) flag unexpected inaccurate predictions, and (iii) point to training samples irrelevant for predictions on the test set. Based on the underrepresented samples, we suggest where to build new measurement stations. We also show which training samples do not substantially contribute to the model performance. This study demonstrates the application of explainable machine learning beyond simply explaining the trained model.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Model Trainingmentioning

confidence: 99%

See 1 more Smart Citation

Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset

Stadtler

Betancourt

Roscher

2022

MAKE

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the absence of a global observation-based ozone climatology, we apply the CAMS reanalysis product in our analysis. We note that recent studies have explored the fusion of observations and model output to generate surface O 3 products at a global scale (Chang et al, 2019;Betancourt et al, 2022), but these approaches only work well in regions where measurement sites are available. The CAMS reanalysis provides surface concentrations at a scale comparable with our model, and thus avoids uncertainties associated with the spatial representativeness of observations when using measured concentrations.…”

Section: Deep Learning Model Applicationmentioning

confidence: 99%

Correcting ozone biases in a global chemistry–climate model: implications for future ozone

et al. 2022

View full text Add to dashboard Cite

Abstract. Weaknesses in process representation in chemistry–climate models lead to biases in simulating surface ozone and to uncertainty in projections of future ozone change. We here develop a deep learning model to demonstrate the feasibility of ozone bias correction in a global chemistry–climate model. We apply this approach to identify the key factors causing ozone biases and to correct projections of future surface ozone. Temperature and the related geographic variables latitude and month show the strongest relationship with ozone biases. This indicates that ozone biases are sensitive to temperature and suggests weaknesses in representation of temperature-sensitive physical or chemical processes. Photolysis rates are also an important factor, highlighting the sensitivity of biases to simulated cloud cover and insolation. Atmospheric chemical species such as the hydroxyl radical, nitric acid and peroxyacyl nitrate show strong positive relationships with ozone biases on a regional scale. These relationships reveal the conditions under which ozone biases occur, although they reflect association rather than direct causation. We correct model projections of future ozone under different climate and emission scenarios following the shared socio-economic pathways. We find that changes in seasonal ozone mixing ratios from the present day to the future are generally smaller than those simulated without bias correction, especially in high-emission regions. This suggests that the ozone sensitivity to changing emissions and climate may be overestimated with chemistry–climate models. Given the uncertainty in simulating future ozone, we show that deep learning approaches can provide improved assessment of the impacts of climate and emission changes on future air quality, along with valuable information to guide future model development.

show abstract

“…If training datasets don’t themselves contain error uncertainty values, scientists could experiment with Bayesian-style priors that they themselves estimate so that uncertainty can still be properly represented and propagated (e.g., ref. 5 ).…”

mentioning

confidence: 99%

Ways forward for Machine Learning to make useful global environmental datasets from legacy observations and measurements

2022

Nat Commun

View full text Add to dashboard Cite

Advances in geospatial and Machine Learning techniques for large datasets of georeferenced observations have made it possible to produce model-based global maps of ecological and environmental variables. However, the implementation of existing scientific methods (especially Machine Learning models) to produce accurate global maps is often complex. Tomislav Hengl (co-founder of OpenGeoHub foundation), Johan van den Hoogen (researcher at ETH Zürich), and Devin Routh (Science IT Consultant at the University of Zürich) shared with Nature Communications their perspectives for creators and users of these maps, focusing on the key challenges in producing global environmental geospatial datasets to achieve significant impacts.

show abstract

Global, high-resolution mapping of tropospheric ozone – explainable machine learning and impact of uncertainties

Cited by 19 publications

References 56 publications

Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset

Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset

Correcting ozone biases in a global chemistry–climate model: implications for future ozone

Ways forward for Machine Learning to make useful global environmental datasets from legacy observations and measurements

Contact Info

Product

Resources

About