Protein–ligand
binding affinity reflects the equilibrium
thermodynamics of the protein–ligand binding process. Binding/unbinding
kinetics is the other side of the coin. Computational models for interpreting
the quantitative structure–kinetics relationship (QSKR) aim
at predicting protein–ligand binding/unbinding kinetics based
on protein structure, ligand structure, or their complex structure,
which in principle can provide a more rational basis for structure-based
drug design. Thus far, most of the public data sets used for deriving
such QSKR models are rather limited in sample size and structural
diversity. To tackle this problem, we have compiled a set of 680 protein–ligand
complexes with experimental dissociation rate constants (
k
off
), which were mainly curated from the references accumulated
for updating our PDBbind database. Three-dimensional structure of
each protein–ligand complex in this data set was either retrieved
from the Protein Data Bank or carefully modeled based on a proper
template. The entire data set covers 155 types of protein, with their
dissociation kinetic constants (
k
off
)
spanning nearly 10 orders of magnitude. To the best of our knowledge,
this data set is the largest of its kind reported publicly. Utilizing
this data set, we derived a random forest (RF) model based on protein–ligand
atom pair descriptors for predicting
k
off
values. We also demonstrated that utilizing modeled structures as
additional training samples will benefit the model performance. The
RF model with mixed structures can serve as a baseline for testifying
other more sophisticated QSKR models. The whole data set, namely,
PDBbind-koff-2020
, is available for free download at our
PDBbind-CN web site (
).