Abstract. The heterogeneous chemistry of atmospheric aerosols involves multiphase chemical kinetics that can be described by kinetic multi-layer models (KMs) that explicitly resolve mass transport and chemical reactions. However, KMs are computationally too expensive to be used as sub-modules in large-scale atmospheric models, and the computational costs also limit their utility in inverse-modeling approaches commonly used to infer aerosol kinetic parameters from laboratory studies. In this study, we show how machine learning methods can generate inexpensive surrogate models for the kinetic multi-layer model of aerosol surface and bulk chemistry (KM-SUB) to predict reaction times in multiphase chemical systems. We apply and compare two common and openly available methods for the generation of surrogate models, polynomial chaos expansion (PCE) with UQLab and neural networks (NNs) through the Python package Keras. We show that the PCE method is well suited to determining global sensitivity indices of the KMs, and we demonstrate how inverse-modeling applications can be enabled or accelerated with NN-suggested sampling. These qualities make them suitable supporting tools for laboratory work in the interpretation of data and the design of future experiments. Overall, the KM surrogate models investigated in this study are fast, accurate, and robust, which suggests their applicability as sub-modules in large-scale atmospheric models.
Quinones are chemical compounds commonly found in air particulate matter (PM).Their redox activity can generate reactive oxygen species (ROS) and contribute to the oxidative potential (OP) of PM leading to adverse health effects of aerosols. The quinones' OP and ability to form ROS are linked to their reduction potential (RP, measured in volts), a metric for the tendency to lose electrons in redox reactions. Here, we use convolutional neural networks (CNN) as quantitative structure-activity relationship (QSAR) models to relate the one-electron RP of quinones to their molecular structure. For CNN training and testing, a data set of more than 100,000 quinones with associated RP values derived from density functional theory calculations was encoded in simplified molecular input line entry system (SMILES). The best performing CNN model achieved a root mean square error (RMSE) of 0.115 V for an independent test data set and outperformed linear regression models fitted on common molecular descriptors (≥ 0.140 V RMSE). Augmentation methods were newly adapted or applied to support CNN training with smaller data sets, improving RMSE by up to approximately 37% for a data set of 321 molecules. Adjusted for solvent effects, the CNN-derived RP predictions showed good agreement with experimental data.Using the newly developed method, we identified a subset of atmospherically relevant quinones that are likely to have a high OP and play a role in aerosol health effects, which remains to be further elucidated by experimental studies. We suggest to use the presented machine learning approach in further investigations of atmospheric aerosol chemistry and health effects as well as other studies that require a target-oriented screening of the properties and effects of large classes of substances.
Abstract. The heterogeneous chemistry of atmospheric aerosols involves multiphase chemical kinetics that can be described by kinetic multi-layer models (KM) explicitly resolving mass transport and chemical reaction. However, KM are computationally too expensive to be used as sub-modules in large-scale atmospheric models, and the computational costs also limit their utility in inverse modelling approaches commonly used to infer aerosol kinetic parameters from laboratory studies. In this study, we show how machine learning methods can generate inexpensive surrogate models based on the kinetic multi-layer model of aerosol surface and bulk chemistry (KM-SUB). We apply and compare two common and openly available methods for the generation of surrogate models, polynomial chaos expansion (PCE) with UQLab and neural networks (NN) through the Python package Keras. We show that the PCE method is well-suited to determine global sensitivity indices of the KM and demonstrate how inverse modelling applications can be enabled or accelerated with NN-suggested sampling. These qualities make them suitable supporting tools for laboratory work in the interpretation of data and design of future experiments. Overall, the KM surrogate models investigated in this study are fast, accurate, and robust, which suggests their applicability as sub-modules in large-scale atmospheric models.
More and more next-generation sequencing (NGS) data are made available every day. However, the quality of this data is not always guaranteed. Available quality control tools require profound knowledge to correctly interpret the multiplicity of quality features. Moreover, it is usually difficult to know if quality features are relevant in all experimental conditions. Therefore, the NGS community would highly benefit from condition-specific data-driven guidelines derived from many publicly available experiments, which reflect routinely generated NGS data. In this work, we have characterized well-known quality guidelines and related features in big datasets and concluded that they are too limited for assessing the quality of a given NGS file accurately. Therefore, we present new data-driven guidelines derived from the statistical analysis of many public datasets using quality features calculated by common bioinformatics tools. Thanks to this approach, we confirm the high relevance of genome mapping statistics to assess the quality of the data, and we demonstrate the limited scope of some quality features that are not relevant in all conditions. Our guidelines are available at https://cbdm.uni-mainz.de/ngs-guidelines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.