2021
DOI: 10.1002/cem.3349
|View full text |Cite
|
Sign up to set email alerts
|

Machine learning in prediction of intrinsic aqueous solubility of drug‐like compounds: Generalization, complexity, or predictive ability?

Abstract: We present a collection of publicly available intrinsic aqueous solubility data of 829 drug-like compounds. Four different machine learning algorithms (random forests [RF], LightGBM, partial least squares, and least absolute shrinkage and selection operator [LASSO]) coupled with multistage permutation importance for feature selection and Bayesian hyperparameter optimization were used for the prediction of solubility based on chemical structural information. Our results show that LASSO yielded the best predicti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
57
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 50 publications
(60 citation statements)
references
References 75 publications
(141 reference statements)
3
57
0
Order By: Relevance
“…Our comparison shows that an algorithm relying on a linear combination of variables fails on average against ensemble algorithms. The low performance of the linear model was also observed in our prior and other works regarding both regression [41,46] and classification tasks [30,44,47]. Besides failing due to nonlinear relationships and boundaries between the classes, linear models can also suffer from the utilization of irrelevant features in models and complex cancellation effects [30].…”
Section: Discussionsupporting
confidence: 60%
See 2 more Smart Citations
“…Our comparison shows that an algorithm relying on a linear combination of variables fails on average against ensemble algorithms. The low performance of the linear model was also observed in our prior and other works regarding both regression [41,46] and classification tasks [30,44,47]. Besides failing due to nonlinear relationships and boundaries between the classes, linear models can also suffer from the utilization of irrelevant features in models and complex cancellation effects [30].…”
Section: Discussionsupporting
confidence: 60%
“…Additionally, MEF50-based outcomes were also predicted by hsCRP (Table 6), which is a marker of subtly elevated systemic inflammation in asthma [40]. The evidence shows that increased hsCRP is associated with more severe asthma outcomes [41]. This, in addition to the fact that the model predicting MEF50-related response performed better in almost all parameters (except specificity) compared to the FEV1-related response (see Table 5) and the fact that oversampling further improved the model's power in predicting true responders and non-responders (see Figure 2) for MEF50, highlights the importance of the distal airways in children with asthma [42].…”
Section: Discussionmentioning
confidence: 95%
See 1 more Smart Citation
“…Furthermore, solubility is also an important factor in other processes such as dosage, pre-formulation, crystallization, purification, and quantification [1,2]. The importance of drug solubility has led to the development of one of the most important lines of research in the pharmaceutical industry, which consists in the development of mathematical models and has evolved towards the incursion of artificial intelligence in the development of algorithms that predict the solubility of drugs in different solvents [3][4][5][6][7][8][9].…”
Section: Introductionmentioning
confidence: 99%
“…Recent works on predicting solubility through machine learning utilized molecular fingerprints in their work. 20 , 24 26 Previous studies use machine learning/deep learning (ML/DL) methods including random forest (RF), support vector regression (SVR), LightGBM, LASSO and so on, 20 , 24 , 26 Naïve-Bayes based models, 25 and also deep learning to predict solubility. While the importance of solubility prediction has been emphasized, various studies have been on solubility prediction has been reported.…”
Section: Introductionmentioning
confidence: 99%