In structure-based drug design, scoring functions are often employed to evaluate protein−ligand interactions. A variety of scoring functions have been developed so far, and thus, some objective benchmarks are desired for assessing their strength and weakness. The comparative assessment of scoring functions (CASF) benchmark developed by us provides an answer to this demand. CASF is designed as a "scoring benchmark", where the scoring process is decoupled from the docking process to depict the performance of scoring function more precisely. Here, we describe the latest update of this benchmark, i.e., CASF-2016. Each scoring function is still evaluated by four metrics, including "scoring power", "ranking power", "docking power", and "screening power". Nevertheless, the evaluation methods have been improved considerably in several aspects. A new test set is compiled, which consists of 285 protein−ligand complexes with high-quality crystal structures and reliable binding constants. A panel of 25 scoring functions are tested on CASF-2016 as a demonstration. Our results reveal that the performance of current scoring functions is more promising in terms of docking power than scoring, ranking, and screening power. Scoring power is somewhat correlated with ranking power, so are docking power and screening power. The results obtained on CASF-2016 may provide valuable guidance for the end users to make smart choices among available scoring functions. Moreover, CASF is created as an open-access benchmark so that other researchers can utilize it to test a wider range of scoring functions. The complete CASF-2016 benchmark will be released on the PDBbind-CN web server (http://www.pdbbind-cn.org/casf.asp/) once this article is published.
We have performed a comprehensive analysis of water molecules at the protein-ligand interfaces observed in 392 high-resolution crystal structures. There are a total of 1829 ligand-bound water molecules in these 392 complexes; 18% are surface water molecules, and 72% are interfacial water molecules. The number of ligand-bound water molecules in each complex structure ranges from 0 to 21 and has an average of 4.6. Of these interfacial water molecules, 76% are considered to be bridging water molecules, characterized by having polar interactions with both ligand and protein atoms. Among a number of factors that may influence the number of ligand-bound water molecules, the polar van der Waals (vdw) surface area of ligands has the highest Pearson linear correlation coefficient of 0.63. Our regression analysis predicted that one more ligand-bound water molecule is expected for every additional 24 A2 in the polar vdw surface area of the ligand. In contrast to the observation that the resolution is the primary factor influencing the number of water molecules in crystallographic models of proteins, we found that there is only a weak relationship between the number of ligand-bound water molecules and the resolution of the crystal structures. An analysis of the isotropic B factors of buried ligand-bound water molecules suggested that, when water molecules have fewer than two polar interactions with the protein-ligand complex, they are more mobile than protein atoms in the crystal structures; when they have more than three polar interactions, they are significantly less mobile than protein atoms.
In recent years, protein−ligand interaction scoring functions derived through machine-learning are repeatedly reported to outperform conventional scoring functions. However, several published studies have questioned that the superior performance of machine-learning scoring functions is dependent on the overlap between the training set and the test set. In order to examine the true power of machine-learning algorithms in scoring function formulation, we have conducted a systematic study of six off-the-shelf machine-learning algorithms, including Bayesian Ridge Regression (BRR), Decision Tree (DT), K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), Linear Support Vector Regression (L-SVR), and Random Forest (RF). Model scoring functions were derived with these machine-learning algorithms on various training sets selected from over 3700 protein−ligand complexes in the PDBbind refined set (version 2016). All resulting scoring functions were then applied to the CASF-2016 test set to validate their scoring power. In our first series of trial, the size of the training set was fixed; while the overall similarity between the training set and the test set was varied systematically. In our second series of trial, the overall similarity between the training set and the test set was fixed, while the size of the training set was varied. Our results indicate that the performance of those machinelearning models are more or less dependent on the contents or the size of the training set, where the RF model demonstrates the best learning capability. In contrast, the performance of three conventional scoring functions (i.e., ChemScore, ASP, and X-Score) is basically insensitive to the use of different training sets. Therefore, one has to consider not only "hard overlap" but also "soft overlap" between the training set and the test set in order to evaluate machine-learning scoring functions. In this spirit, we have complied data sets based on the PDBbind refined set by removing redundant samples under several similarity thresholds. Scoring functions developers are encouraged to employ them as standard training sets if they want to evaluate their new models on the CASF-2016 benchmark.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.