In structure-based drug design, scoring functions are often employed to evaluate protein−ligand interactions. A variety of scoring functions have been developed so far, and thus, some objective benchmarks are desired for assessing their strength and weakness. The comparative assessment of scoring functions (CASF) benchmark developed by us provides an answer to this demand. CASF is designed as a "scoring benchmark", where the scoring process is decoupled from the docking process to depict the performance of scoring function more precisely. Here, we describe the latest update of this benchmark, i.e., CASF-2016. Each scoring function is still evaluated by four metrics, including "scoring power", "ranking power", "docking power", and "screening power". Nevertheless, the evaluation methods have been improved considerably in several aspects. A new test set is compiled, which consists of 285 protein−ligand complexes with high-quality crystal structures and reliable binding constants. A panel of 25 scoring functions are tested on CASF-2016 as a demonstration. Our results reveal that the performance of current scoring functions is more promising in terms of docking power than scoring, ranking, and screening power. Scoring power is somewhat correlated with ranking power, so are docking power and screening power. The results obtained on CASF-2016 may provide valuable guidance for the end users to make smart choices among available scoring functions. Moreover, CASF is created as an open-access benchmark so that other researchers can utilize it to test a wider range of scoring functions. The complete CASF-2016 benchmark will be released on the PDBbind-CN web server (http://www.pdbbind-cn.org/casf.asp/) once this article is published.
In recent years, protein−ligand interaction scoring functions derived through machine-learning are repeatedly reported to outperform conventional scoring functions. However, several published studies have questioned that the superior performance of machine-learning scoring functions is dependent on the overlap between the training set and the test set. In order to examine the true power of machine-learning algorithms in scoring function formulation, we have conducted a systematic study of six off-the-shelf machine-learning algorithms, including Bayesian Ridge Regression (BRR), Decision Tree (DT), K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), Linear Support Vector Regression (L-SVR), and Random Forest (RF). Model scoring functions were derived with these machine-learning algorithms on various training sets selected from over 3700 protein−ligand complexes in the PDBbind refined set (version 2016). All resulting scoring functions were then applied to the CASF-2016 test set to validate their scoring power. In our first series of trial, the size of the training set was fixed; while the overall similarity between the training set and the test set was varied systematically. In our second series of trial, the overall similarity between the training set and the test set was fixed, while the size of the training set was varied. Our results indicate that the performance of those machinelearning models are more or less dependent on the contents or the size of the training set, where the RF model demonstrates the best learning capability. In contrast, the performance of three conventional scoring functions (i.e., ChemScore, ASP, and X-Score) is basically insensitive to the use of different training sets. Therefore, one has to consider not only "hard overlap" but also "soft overlap" between the training set and the test set in order to evaluate machine-learning scoring functions. In this spirit, we have complied data sets based on the PDBbind refined set by removing redundant samples under several similarity thresholds. Scoring functions developers are encouraged to employ them as standard training sets if they want to evaluate their new models on the CASF-2016 benchmark.
Human heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1) serves as a key regulating protein in RNA metabolism. Malfunction of hnRNPA1 in nucleo-cytoplasmic transport or dynamic phase separation leads to abnormal amyloid aggregation and neurodegeneration. The low complexity (LC) domain of hnRNPA1 drives both dynamic phase separation and amyloid aggregation. Here, we use cryo-electron microscopy to determine the amyloid fibril structure formed by hnRNPA1 LC domain. Remarkably, the structure reveals that the nuclear localization sequence of hnRNPA1 (termed PY-NLS), which is initially known to mediate the nucleo-cytoplamic transport of hnRNPA1 through binding with karyopherin-β2 (Kapβ2), represents the major component of the fibril core. The residues that contribute to the binding of PY-NLS with Kapβ2 also exert key molecular interactions to stabilize the fibril structure. Notably, hnRNPA1 mutations found in familial amyotrophic lateral sclerosis (ALS) and multisystem proteinopathoy (MSP) are all involved in the fibril core and contribute to fibril stability. Our work illuminates structural understandings of the pathological amyloid aggregation of hnRNPA1 and the amyloid disaggregase activity of Kapβ2, and highlights the multiple roles of PY-NLS in hnRNPA1 homeostasis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.