2007
DOI: 10.1021/ci600426e
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating Virtual Screening Methods:  Good and Bad Metrics for the “Early Recognition” Problem

Abstract: Many metrics are currently used to evaluate the performance of ranking methods in virtual screening (VS), for instance, the area under the receiver operating characteristic curve (ROC), the area under the accumulation curve (AUAC), the average rank of actives, the enrichment factor (EF), and the robust initial enhancement (RIE) proposed by Sheridan et al. In this work, we show that the ROC, the AUAC, and the average rank metrics have the same inappropriate behaviors that make them poor metrics for comparing VS… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

8
895
0
9

Year Published

2008
2008
2021
2021

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 715 publications
(912 citation statements)
references
References 28 publications
8
895
0
9
Order By: Relevance
“…The model with the highest value of BEDROC (α=160.9) among all single models within the ensemble is also the one with the best value of this metric among the set of representative valid models. This improvement might seem meaningless, however it can be interpreted as 5% http://sciforum.net/conference/mol2net-1 more probability of retrieving a dual target ligand in the first 1% of the ranked list using the obtained ensemble compared to the best performing individual model [22]. From Table 2 it can also be seen that the values of BEDROC for α=32.2 and α=20 are also higher when the ensemble model is compared with its members.…”
Section: Resultsmentioning
confidence: 96%
See 1 more Smart Citation
“…The model with the highest value of BEDROC (α=160.9) among all single models within the ensemble is also the one with the best value of this metric among the set of representative valid models. This improvement might seem meaningless, however it can be interpreted as 5% http://sciforum.net/conference/mol2net-1 more probability of retrieving a dual target ligand in the first 1% of the ranked list using the obtained ensemble compared to the best performing individual model [22]. From Table 2 it can also be seen that the values of BEDROC for α=32.2 and α=20 are also higher when the ensemble model is compared with its members.…”
Section: Resultsmentioning
confidence: 96%
“…al. [22] In our previous research [9] the VS-tailored ensembles were obtained using only a very limited set of predictive models that were selected as the best ones derived from each modeling approach. However, during the modeling process a larger number of high quality models are obtained that we did not considered for ensemble modeling in that investigation.…”
Section: Resultsmentioning
confidence: 99%
“…We have therefore chosen to use the Boltzman-Enhanced ROC score (BEDROC) as an alternative evaluation score. BEDROC scales from 0.0 to 1.0, where a value closer to 1.0 indicates superior virtual screening performance 31 .…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…[15][16][17][18] However, the ROC metric has one important shortcoming: it is unable to address the so-called "early recognition" problem. 13,14 Since usually only a small fraction of a VS ranking can be tested experimentally, a good metric for VS should reflect the enrichment of actives at the beginning of the ranking. Recent efforts have sought to develop VS performance metrics that combine the statistic stability of ROC with the early recognition properties of RTR or EF.…”
Section: Introductionmentioning
confidence: 99%
“…Recent efforts have sought to develop VS performance metrics that combine the statistic stability of ROC with the early recognition properties of RTR or EF. 13,14 However, due to their novelty, these metrics have not yet found extensive use in VS validation studies and it is therefore difficult to compare the results obtained with these metrics to those of other works. Thus, in this study, RTR and ROC were used in a complementary manner for the analysis of VS rankings.…”
Section: Introductionmentioning
confidence: 99%