2020
DOI: 10.1021/acs.jcim.9b00714
|View full text |Cite
|
Sign up to set email alerts
|

Tapping on the Black Box: How Is the Scoring Power of a Machine-Learning Scoring Function Dependent on the Training Set?

Abstract: In recent years, protein−ligand interaction scoring functions derived through machine-learning are repeatedly reported to outperform conventional scoring functions. However, several published studies have questioned that the superior performance of machine-learning scoring functions is dependent on the overlap between the training set and the test set. In order to examine the true power of machine-learning algorithms in scoring function formulation, we have conducted a systematic study of six off-the-shelf mac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

14
137
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 74 publications
(152 citation statements)
references
References 62 publications
14
137
0
1
Order By: Relevance
“…All three models (LB, SB, and HB) are strongly affected by the similarity between the training and test sets, with the exclusion of training set complexes with similar proteins or ligands to those in the test set significantly reducing performance. These results echo our earlier results 15 and those of Su et al 33 , indicating that even when potentially lessaccurate binding poses are used, it is necessary to consider the effect of biases in the available structural data when training and evaluating models. The inclusion of ligand-based features in structure-based models always improves performance when using docked poses, and only ceases to improve performance when using crystal poses if the maximum fingerprint Tanimoto similarity between ligands in the training and test set is less than or equal to 0.5.…”
Section: Effect Of Training and Testing Using Docked Posessupporting
confidence: 90%
“…All three models (LB, SB, and HB) are strongly affected by the similarity between the training and test sets, with the exclusion of training set complexes with similar proteins or ligands to those in the test set significantly reducing performance. These results echo our earlier results 15 and those of Su et al 33 , indicating that even when potentially lessaccurate binding poses are used, it is necessary to consider the effect of biases in the available structural data when training and evaluating models. The inclusion of ligand-based features in structure-based models always improves performance when using docked poses, and only ceases to improve performance when using crystal poses if the maximum fingerprint Tanimoto similarity between ligands in the training and test set is less than or equal to 0.5.…”
Section: Effect Of Training and Testing Using Docked Posessupporting
confidence: 90%
“…Random forest methods, which were best suited for the G regression on known binders, were for most proteins outperformed by the simpler linear regression methods. This observation might support the recent finding that random forest methods, in particular, benefit from highly similar training molecules (Su et al, 2020). Considering the strengths and weaknesses of the different machine learning methods, we therefore recommend that for applications of RASPD+, the results of the seven different machine learning methods are combined by picking top candidates from the rankings produced by each method.…”
Section: Discussionsupporting
confidence: 73%
“…By splitting the PDBbind training, testing, and validation data in a nested cross-validation setup, we were able to assess reliably that random forest models, particularly the extremely random forest model, performed best on this type of data. While this splitting strategy increases confidence in the comparison of learning methods and feature importance analysis within the study, other data set splitting strategies, which explicitly control how similar proteins or ligands are between training and test sets (Feinberg et al, 2018;Sieg et al, 2019;Su et al, 2020), may be more appropriate to assess performance on completely different ligands or proteins directly.…”
Section: Discussionmentioning
confidence: 99%
“…The highest enrichment was observed for ion channels, G-protein coupled receptors (GPCR), and kinases in both settings (Table 4). analysis within the study, other data set splitting strategies, which explicitly control how similar proteins or ligands are between training and test sets 14,52,53 , may be more appropriate to assess performance on completely different ligands or proteins directly.…”
Section: Enrichment Of Active Molecules From the Dud-e Data Setmentioning
confidence: 99%