Abstract. With the AQ-Bench dataset, we contribute to the recent developments towards shared data usage and machine learning methods in the field of environmental science. The dataset presented here enables researchers to relate global air quality metrics to easy-access metadata and to explore different machine learning methods for obtaining estimates of air quality based on this metadata. AQ-Bench contains a unique collection of aggregated air quality data from the years 2010–2014 and metadata at more than 5500 air quality monitoring stations all over the world, provided by the first Tropospheric Ozone Assessment Report (TOAR). It focuses in particular on metrics of tropospheric ozone, which has a detrimental effect on climate, human morbidity and mortality, as well as crop yields. The purpose of this dataset is to produce estimates of various long-term ozone metrics based on time-independent local site conditions. We combine this task with a suitable evaluation metric. Baseline scores obtained from a linear regression method, a fully connected neural network and random forest are provided for reference and validation. AQ-Bench offers a low-threshold entrance for all machine learners with an interest in environmental science and for atmospheric scientists who are interested in applying machine learning techniques. It enables them to start with a real-world problem relevant to humans and nature. The dataset and introductory machine learning code are available at https://doi.org/10.23728/b2share.30d42b5a87344e82855a486bf2123e9f (Betancourt et al., 2020) and https://gitlab.version.fz-juelich.de/esde/machine-learning/aq-bench (Betancourt et al., 2021). AQ-Bench thus provides a blueprint for environmental benchmark datasets as well as an example for data re-use according to the FAIR principles.
Abstract. Tropospheric ozone is a toxic greenhouse gas with a highly variable spatial distribution which is challenging to map on a global scale. Here, we present a data-driven ozone-mapping workflow generating a transparent and reliable product. We map the global distribution of tropospheric ozone from sparse, irregularly placed measurement stations to a high-resolution regular grid using machine learning methods. The produced map contains the average tropospheric ozone concentration of the years 2010–2014 with a resolution of 0.1∘ × 0.1∘. The machine learning model is trained on AQ-Bench (“air quality benchmark dataset”), a pre-compiled benchmark dataset consisting of multi-year ground-based ozone measurements combined with an abundance of high-resolution geospatial data. Going beyond standard mapping methods, this work focuses on two key aspects to increase the integrity of the produced map. Using explainable machine learning methods, we ensure that the trained machine learning model is consistent with commonly accepted knowledge about tropospheric ozone. To assess the impact of data and model uncertainties on our ozone map, we show that the machine learning model is robust against typical fluctuations in ozone values and geospatial data. By inspecting the input features, we ensure that the model is only applied in regions where it is reliable. We provide a rationale for the tools we use to conduct a thorough global analysis. The methods presented here can thus be easily transferred to other mapping applications to ensure the transparency and reliability of the maps produced.
Abstract. With the AQ-Bench dataset, we contribute to the recent developments towards shared data usage and machine learning methods in the field of environmental science. The dataset presented here enables researchers to relate global air quality metrics to easy-access metadata and to explore different machine learning methods for obtaining estimates of air quality based on this metadata. AQ-Bench contains a unique collection of aggregated air quality data from the years 2010–2014 and metadata at more than 5500 air quality monitoring stations all over the world, provided by the first Tropospheric Ozone Assessment Report (TOAR). It focuses in particular on metrics of tropospheric ozone, which has a detrimental effect on climate, human morbidity and mortality, as well as crop yields. We validate these data as a machine learning benchmark by providing a well-defined task together with a suitable evaluation metric. Baseline scores obtained from a linear regression method, a fully connected neural network and random forest are provided for reference. AQ-Bench offers a low-threshold entrance for all machine learners with an interest in environmental science and for atmospheric scientists who are interested in applying machine learning techniques. It enables them to start with a real-world problem relevant to humans and nature. The dataset and introductory machine learning code are available at https://doi.org/10.23728/b2share.30d42b5a87344e82855a486bf2123e9f (Betancourt et al., 2020) and https://gitlab.version.fz-juelich.de/toar/ozone-mapping . AQ-Bench thus provides a blueprint for environmental benchmark datasets as well as an example for data re-use according to the FAIR principles.
<p>Through the availability of multi-year ground based ozone observations on a global scale, substantial geospatial meta data, and high performance computing capacities, it is now possible to use machine learning for a global data-driven ozone assessment. In this presentation, we will show a novel, completely data-driven approach to map tropospheric ozone globally.</p><p>Our goal is to interpolate ozone metrics and aggregated statistics from the database of the Tropospheric Ozone Assessment Report (TOAR) onto a global 0.1&#176; x 0.1&#176; resolution grid. &#160;It is challenging to interpolate ozone, a toxic greenhouse gas because its formation depends on many interconnected environmental factors on small scales. We conduct the interpolation with various machine learning methods trained on aggregated hourly ozone data from five years at more than 5500 locations worldwide. We use several geospatial datasets as training inputs to provide proxy input for environmental factors controlling ozone formation, such as precursor emissions and climate. The resulting maps contain different ozone metrics, i.e. statistical aggregations which are widely used to assess air pollution impacts on health, vegetation, and climate.</p><p>The key aspects of this contribution are twofold: First, we apply explainable machine learning methods to the data-driven ozone assessment. Second, we discuss dominant uncertainties relevant to the ozone mapping and quantify their impact whenever possible. Our methods include a thorough a-priori uncertainty estimation of the various data and methods, assessment of scientific consistency, finding critical model parameters, using ensemble methods, and performing error modeling.</p><p>Our work aims to increase the reliability and integrity of the derived ozone maps through the provision of scientific robustness to a data-centric machine learning task. This study hence represents a blueprint for how to formulate an environmental machine learning task scientifically, gather the necessary data, and develop a data-driven workflow that focuses on optimizing transparency and applicability of its product to maximize its scientific knowledge return.</p>
Abstract. Explainable machine learning has recently gained attention due to its contribution to understanding how a model works and why certain decisions are made. A so far less targeted goal, especially in remote sensing, is the derivation of new knowledge and scientific insights from observational data. In our paper, we propose an explainable machine learning approach to address the challenge that certain land cover classes such as wilderness are not well-defined in satellite imagery and can only be used with vague labels for mapping. Our approach consists of a combined U-Net and ResNet-18 that can perform scene classification while providing at the same time interpretable information with which we can derive new insights about classes. We show that our methodology allows us to deepen our understanding of what makes nature wild by automatically identifying simple concepts such as wasteland that semantically describes wilderness. It further quantifies a class’s sensitivity with respect to a concept and uses it as an indicator for how well a concept describes the class.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.