2023
DOI: 10.1021/acs.jcim.2c01189
|View full text |Cite
|
Sign up to set email alerts
|

Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models

Abstract: Accurate methods to predict solubility from molecular structure are highly sought after in the chemical sciences. To assess the state of the art, the American Chemical Society organized a “Second Solubility Challenge” in 2019, in which competitors were invited to submit blinded predictions of the solubilities of 132 drug-like molecules. In the first part of this article, we describe the development of two models that were submitted to the Blind Challenge in 2019 but which have not previously been reported. The… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 50 publications
0
8
0
Order By: Relevance
“…5 displays the chemical space graphically and follows the protocol described in some of the author's previous work. 89 This figure is generated using the zinc-20938 and ccs-lit-167 data sets. In this figure each molecule is represented as a node in the graph and the most similar (Tanimoto similarity scores of ≥0.7 using Morgan fingerprints with a radius of 2 and 2048 bits) are connected.…”
Section: Resultsmentioning
confidence: 99%
“…5 displays the chemical space graphically and follows the protocol described in some of the author's previous work. 89 This figure is generated using the zinc-20938 and ccs-lit-167 data sets. In this figure each molecule is represented as a node in the graph and the most similar (Tanimoto similarity scores of ≥0.7 using Morgan fingerprints with a radius of 2 and 2048 bits) are connected.…”
Section: Resultsmentioning
confidence: 99%
“…For each classification method, 50% of the samples were chosen as the training set by random sampling. The remaining samples were regarded as the test set . As shown in Figure , ROC plots were obtained using the (A) AdaBoost, (B) bagging, (C) DT, (D) KNN, (E) LDA, (F) NB, (G) PLS, (H) RF, and (I) SVM methods.…”
Section: Performance Evaluation Of Machine Learning Methodsmentioning
confidence: 99%
“…The remaining samples were regarded as the test set. 97 As shown in Figure 5, ROC plots were obtained using the (A) AdaBoost, (B) bagging, (C) DT, (D) KNN, (E) LDA, (F) NB, (G) PLS, (H) RF, and (I) SVM methods. By comparison, the AUC values were significantly different when using different classification methods.…”
Section: ■ Performance Evaluation Of Machine Learning Methodsmentioning
confidence: 99%
“…We carry out a head-to-head comparison of the FEP+ solubility approach against state-of-the-art ML approaches using QM descriptors and show that the FEP+ approach achieves improved predictive accuracies (RMSE = 0.87 and 0.86 vs RMSE = 1.02 and 1.24) and differentiation power ( R 2 = 0.64 and 0.69 vs 0.40 and 0.25) for the Literature and AbbVie data sets, respectively. While recent developments in neural network models have achieved similar accuracies in the recent solubility blind test ( R 2 = 0.54, RMSE = 0.86) on a large set of chemically diverse compounds, the performance of such models may vary significantly depending on the data sets used for training …”
Section: Conclusion and Outlookmentioning
confidence: 99%
“…While recent developments in neural network models have achieved similar accuracies in the recent solubility blind test (R 2 = 0.54, RMSE = 0.86) on a large set of chemically diverse compounds, the performance of such models may vary significantly depending on the data sets used for training. 36 In addition to the predictive power of the crystalline FEP+ approach for both structurally similar and diverse compounds, the method also provides unique energetic insights by dissecting the underlying contributions of the solubility into sublimation and hydration energy components. Such insights can reveal invaluable information for rational drug design, as well as guide tailored formulation approaches for drug development.…”
Section: ■ Conclusion and Outlookmentioning
confidence: 99%