Scoring functions are widely applied to the evaluation of protein-ligand binding in structure-based drug design. We have conducted a comparative assessment of 16 popular scoring functions implemented in main-stream commercial software or released by academic research groups. A set of 195 diverse protein-ligand complexes with high-resolution crystal structures and reliable binding constants were selected through a systematic nonredundant sampling of the PDBbind database and used as the primary test set in our study. All scoring functions were evaluated in three aspects, that is, "docking power", "ranking power", and "scoring power", and all evaluations were independent from the context of molecular docking or virtual screening. As for "docking power", six scoring functions, including GOLD::ASP, DS::PLP1, DrugScore(PDB), GlideScore-SP, DS::LigScore, and GOLD::ChemScore, achieved success rates over 70% when the acceptance cutoff was root-mean-square deviation < 2.0 A. Combining these scoring functions into consensus scoring schemes improved the success rates to 80% or even higher. As for "ranking power" and "scoring power", the top four scoring functions on the primary test set were X-Score, DrugScore(CSD), DS::PLP, and SYBYL::ChemScore. They were able to correctly rank the protein-ligand complexes containing the same type of protein with success rates around 50%. Correlation coefficients between the experimental binding constants and the binding scores computed by these scoring functions ranged from 0.545 to 0.644. Besides the primary test set, each scoring function was also tested on four additional test sets, each consisting of a certain number of protein-ligand complexes containing one particular type of protein. Our study serves as an updated benchmark for evaluating the general performance of today's scoring functions. Our results indicate that no single scoring function consistently outperforms others in all three aspects. Thus, it is important in practice to choose the appropriate scoring functions for different purposes.
Our comparative assessment of scoring functions (CASF) benchmark is created to provide an objective evaluation of current scoring functions. The key idea of CASF is to compare the general performance of scoring functions on a diverse set of protein-ligand complexes. In order to avoid testing scoring functions in the context of molecular docking, the scoring process is separated from the docking (or sampling) process by using ensembles of ligand binding poses that are generated in prior. Here, we describe the technical methods and evaluation results of the latest CASF-2013 study. The PDBbind core set (version 2013) was employed as the primary test set in this study, which consists of 195 protein-ligand complexes with high-quality three-dimensional structures and reliable binding constants. A panel of 20 scoring functions, most of which are implemented in main-stream commercial software, were evaluated in terms of "scoring power" (binding affinity prediction), "ranking power" (relative ranking prediction), "docking power" (binding pose prediction), and "screening power" (discrimination of true binders from random molecules). Our results reveal that the performance of these scoring functions is generally more promising in the docking/screening power tests than in the scoring/ranking power tests. Top-ranked scoring functions in the scoring power test, such as X-Score(HM), ChemScore@SYBYL, ChemPLP@GOLD, and PLP@DS, are also top-ranked in the ranking power test. Top-ranked scoring functions in the docking power test, such as ChemPLP@GOLD, Chemscore@GOLD, GlidScore-SP, LigScore@DS, and PLP@DS, are also top-ranked in the screening power test. Our results obtained on the entire test set and its subsets suggest that the real challenge in protein-ligand binding affinity prediction lies in polar interactions and associated desolvation effect. Nonadditive features observed among high-affinity protein-ligand complexes also need attention.
Scoring functions are often applied in combination with molecular docking methods to predict ligand binding poses and ligand binding affinities or to identify active compounds through virtual screening. An objective benchmark for assessing the performance of current scoring functions is expected to provide practical guidance for the users to make smart choices among available methods. It can also elucidate the common weakness in current methods for future improvements. The primary goal of our comparative assessment of scoring functions (CASF) project is to provide a high-standard, publicly accessible benchmark of this type. Our latest study, i.e., CASF-2013, evaluated 20 popular scoring functions on an updated set of protein-ligand complexes. This data set was selected out of 8302 protein-ligand complexes recorded in the PDBbind database (version 2013) through a fairly complicated process. Sample selection was made by considering the quality of complex structures as well as binding data. Finally, qualified complexes were clustered by 90% similarity in protein sequences. Three representative complexes were chosen from each cluster to control sample redundancy. The final outcome, namely, the PDBbind core set (version 2013), consists of 195 protein-ligand complexes in 65 clusters with binding constants spanning nearly 10 orders of magnitude. In this data set, 82% of the ligand molecules are "druglike" and 78% of the protein molecules are validated or potential drug targets. Correlation between binding constants and several key properties of ligands are discussed. Methods and results of the scoring function evaluation will be described in a companion work in this issue (doi: 10.1021/ci500081m ).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.