2018
DOI: 10.3390/biom8010012
|View full text |Cite
|
Sign up to set email alerts
|

The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction

Abstract: It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
48
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 58 publications
(56 citation statements)
references
References 26 publications
5
48
0
Order By: Relevance
“…The final assessment on the PDBbind benchmark showed that this version could yield a considerable scoring power as v2 ( R p = .803). Besides, a recent study indicated that RF‐Score‐v3 outperformed X‐Score even when 68% of the most similar proteins were removed from the training set, which further verified its superiority …”
Section: Traditional Machine Learning Methods In Scoring Functionsmentioning
confidence: 99%
See 1 more Smart Citation
“…The final assessment on the PDBbind benchmark showed that this version could yield a considerable scoring power as v2 ( R p = .803). Besides, a recent study indicated that RF‐Score‐v3 outperformed X‐Score even when 68% of the most similar proteins were removed from the training set, which further verified its superiority …”
Section: Traditional Machine Learning Methods In Scoring Functionsmentioning
confidence: 99%
“…However, Ballester, the author of RF‐Score, doubted that that LCOCV might be ultimately of little practical value, because these similarities could not be ignored when a SF was employed in a real scenario . In addition, by conducting a similar study as Li et al did, they concluded that ML‐based SFs could also learn from dissimilar training complexes, because RF‐Score was able to outperform X‐Score even when it was trained based on just 32% of the most dissimilar complexes. Thus, in our opinion, in order to develop a reliable SF, both the conventional random training/test splitting and LCOCV are needed, so that a comparison can be made to gain a deeper understanding of the impact of the SF itself and the composition of the dataset on the performance of a SF.…”
Section: Workflow To Develop a Machine Learning‐based Scoring Functionmentioning
confidence: 99%
“…It is important to note too that PM decoys are not required either to train or test QSAR models [35], despite predicting exactly the same in vitro potency/affinity endpoints as SFs (e.g. K d is predicted by both SFs [36,37] and QSAR models [38,39]).…”
Section: Selecting a Scoring Function Based On Your Own Evaluationmentioning
confidence: 99%
“…Therefore, the accuracy of our scoring function is comparable or higher than for most other known methods, both classical (empirical, force-eld and knowledge-based, R ∼ 0.4-0.7(Y. Li et al, 2014;Su et al, 2019)) and machine-learning based (R ∼ 0.6-0.8)(H. Li et al, 2018;Y. Li et al, 2014;Su et al, 2019).…”
Section: Introductionmentioning
confidence: 61%