2018
DOI: 10.1021/acs.jcim.7b00403
|View full text |Cite
|
Sign up to set email alerts
|

Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization

Abstract: Undetected overfitting can occur when there are significant redundancies between training and validation data. We describe AVE, a new measure of training-validation redundancy for ligand-based classification problems, that accounts for the similarity among inactive molecules as well as active ones. We investigated seven widely used benchmarks for virtual screening and classification, and we show that the amount of AVE bias strongly correlates with the performance of ligand-based predictive methods irrespective… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

6
296
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 201 publications
(317 citation statements)
references
References 39 publications
(101 reference statements)
6
296
0
Order By: Relevance
“…The DEKOIS project (currently at version 2.0) [71,72] is intended to provide a "demanding" Among the 81 proteins in the DEKOIS set, we noted that some were included in our training set as well. To avoid any potential information leakage that might overestimate the performance we could expect in future applications [33], we completely removed these testcases. This left a set of 23 protein targets, each of which vScreenML had never seen before.…”
Section: Benchmarking Vscreenml Using Independent Test Setsmentioning
confidence: 99%
See 1 more Smart Citation
“…The DEKOIS project (currently at version 2.0) [71,72] is intended to provide a "demanding" Among the 81 proteins in the DEKOIS set, we noted that some were included in our training set as well. To avoid any potential information leakage that might overestimate the performance we could expect in future applications [33], we completely removed these testcases. This left a set of 23 protein targets, each of which vScreenML had never seen before.…”
Section: Benchmarking Vscreenml Using Independent Test Setsmentioning
confidence: 99%
“…For these reasons, machine learning techniques may be especially well-suited for developing scoring functions that will provide a dramatic improvement in the ability to identify active compounds without human expert intervention. However, while machine learning may offer the potential to improve on the high false positive rate of current scoring function, further analysis has revealed that many methods to date reporting promising results in artificial benchmark experiments may have inadvertently overfit models to the training data [33]: this can be a subtle effect of information leakage, occurring when the validation/testing data are not truly non-redundant from the training data.…”
Section: Introductionmentioning
confidence: 99%
“…This version of the CNN 16 was the subject of all the analyses so far carried out into the importance of the receptor in active/decoy classification. [1][2][3] DenseFS: A much deeper network with three sets of four densely connected convolutional layers followed by a fully-connected softmax layer and cross entropy loss. 18 This network significantly improved performance over the Gnina network on both held-out DUD-E targets and the ChEMBL set.…”
Section: Cnn Architecturesmentioning
confidence: 99%
“…A series of recent papers has shown that some deep learning methods designed for structurebased virtual screening can accurately separate actives and decoys when given only the structure of the ligand. [1][2][3] These results indicate that such methods are learning differences between the properties of actives and decoys, rather than the physical interactions between the receptor and the ligand. From this it is possible to conclude both that the methods will fail to generalize well (predict on datasets far removed from the training data), and that there are significant flaws in the current training datasets and/or regimens.…”
Section: Introductionmentioning
confidence: 97%
See 1 more Smart Citation