Protein–ligand binding affinity reflects the equilibrium thermodynamics of the protein–ligand binding process. Binding/unbinding kinetics is the other side of the coin. Computational models for interpreting the quantitative structure–kinetics relationship (QSKR) aim at predicting protein–ligand binding/unbinding kinetics based on protein structure, ligand structure, or their complex structure, which in principle can provide a more rational basis for structure-based drug design. Thus far, most of the public data sets used for deriving such QSKR models are rather limited in sample size and structural diversity. To tackle this problem, we have compiled a set of 680 protein–ligand complexes with experimental dissociation rate constants ( k off ), which were mainly curated from the references accumulated for updating our PDBbind database. Three-dimensional structure of each protein–ligand complex in this data set was either retrieved from the Protein Data Bank or carefully modeled based on a proper template. The entire data set covers 155 types of protein, with their dissociation kinetic constants ( k off ) spanning nearly 10 orders of magnitude. To the best of our knowledge, this data set is the largest of its kind reported publicly. Utilizing this data set, we derived a random forest (RF) model based on protein–ligand atom pair descriptors for predicting k off values. We also demonstrated that utilizing modeled structures as additional training samples will benefit the model performance. The RF model with mixed structures can serve as a baseline for testifying other more sophisticated QSKR models. The whole data set, namely, PDBbind-koff-2020 , is available for free download at our PDBbind-CN web site ( ).
An increasing number of recent studies have shown that the binding kinetics of a drug molecule to its target correlates strongly with its efficacy in vivo. Therefore, ligand optimization oriented to improved binding kinetics provides new ideas for rational drug design. Currently, ligand binding kinetics is modeled mainly through extensive molecular dynamics simulations, which limits its application to real-world problems. The present study aimed at obtaining a general-purpose quantitative structure-kinetics relationship (QSKR) model for predicting the dissociation rate constant (koff) of a ligand based on its complex structure. This type of model is expected to be suitable for highthroughput tasks in structure-based drug design. We collected the experimentally measured koff values for 406 ligand molecules from literature, and then constructed a three-dimensional structural model for each protein-ligand complex through molecular modeling. A training set was compiled using 60% of those complexes while the remaining 40% were assigned to two test sets. Based on distance-dependent protein-ligand atom pair descriptors, a random forest algorithm was adopted to derive a QSKR model. Various random forest models were then generated based on the descriptor sets obtained under different conditions, such as distance cutoff, bin width, and feature selection criteria. The cross-validation results of those models were then examined. It was observed that the optimal model was obtained when the distance cutoff was 15 Å (1 Å = 0.1 nm), the bin width was 3 Å, and feature selection variance level was 2. The final QSKR model produced correlation coefficients around 0.62 on the two independent test sets. This level of accuracy is at least comparable to that of the predictive models described in literature, which are typically computationally much more expensive. Our study attempts to address the issue of predicting koff values in drug design. We hope that it can provide inspiration for further studies by other researchers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.