2023
DOI: 10.1021/acs.analchem.3c00921
|View full text |Cite
|
Sign up to set email alerts
|

Bridging the Gap between Differential Mobility, Log S, and Log P Using Machine Learning and SHAP Analysis

Abstract: Aqueous solubility, log S, and the water−octanol partition coefficient, log P, are physicochemical properties that are used to screen the viability of drug candidates and to estimate mass transport in the environment. In this work, differential mobility spectrometry (DMS) experiments performed in microsolvating environments are used to train machine learning (ML) frameworks that predict the log S and log P of various molecule classes. In lieu of a consistent source of experimentally measured log S and log P va… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2024
2024
2025
2025

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 61 publications
0
5
0
Order By: Relevance
“…Using the LGBM model, feature importance of the input descriptors was performed using Shapley additive explanations (SHAP) to examine the relative contribution of each descriptor to the predicted result (Figure S7). , This examination revealed two basic findings: First, the experimental descriptors (CNT type, CNT quantity, and ratio of dispersant to CNT) as a whole contributed more significantly to the model predictions. Second, molecular descriptors which captured the overall features of the molecular species (e.g., BCUT2D, Chi, SMR_VSA, and AvgIpc) contributed more than those which described more specific aspects of the molecule, such as the number of atoms (Supporting Information).…”
Section: Resultsmentioning
confidence: 93%
“…Using the LGBM model, feature importance of the input descriptors was performed using Shapley additive explanations (SHAP) to examine the relative contribution of each descriptor to the predicted result (Figure S7). , This examination revealed two basic findings: First, the experimental descriptors (CNT type, CNT quantity, and ratio of dispersant to CNT) as a whole contributed more significantly to the model predictions. Second, molecular descriptors which captured the overall features of the molecular species (e.g., BCUT2D, Chi, SMR_VSA, and AvgIpc) contributed more than those which described more specific aspects of the molecule, such as the number of atoms (Supporting Information).…”
Section: Resultsmentioning
confidence: 93%
“…Its core principle is to calculate the marginal contribution of features to the model output, then explain the “black box model” on both the global and local levels . SHAP builds an additive explanation model, all features are regarded as “contributors”, and the mechanisms of influence behind different influencing factors on the target value are explained by deconstructing the contribution value. SHAP is a “model interpretation” package developed by Python. In this study, Python 3.11 was used with the XGBoost model.…”
Section: Methodsmentioning
confidence: 99%
“…As a data-driven approach, ML models can be trained on available solubility datasets to learn solute-solvent interactions for prediction of unmeasured solubility. To date, most of these ML models were developed to model the solute solubility in a speci c solvent [14][15][16][17][18][19][20][21][22][23][24], with a primary focus on aqueous solubility due to availability of extensive data. These solvent-speci c models generally deliver high accuracy because their design does not require them to account for variations between different solvents.…”
Section: Introductionmentioning
confidence: 99%