Many proteins have small molecule-binding pockets that are not easily
detectable in the ligand-free structures. These cryptic sites require a
conformational change to become apparent; a cryptic site can therefore be
defined as a site that forms a pocket in a holo structure, but
not in the apo structure. Because many proteins appear to lack
druggable pockets, understanding and accurately identifying cryptic sites could
expand the set of drug targets. Previously, cryptic sites were identified
experimentally by fragment-based ligand discovery, and computationally by long
molecular dynamics simulations and fragment docking. Here, we begin by
constructing a set of structurally defined apo-holo pairs with
cryptic sites. Next, we comprehensively characterize the cryptic sites in terms
of their sequence, structure, and dynamics attributes. We find that cryptic
sites tend to be as conserved in evolution as traditional binding pockets, but
are less hydrophobic and more flexible. Relying on this characterization, we use
machine learning to predict cryptic sites with relatively high accuracy (for our
benchmark, the true positive and false positive rates are 73% and
29%, respectively). We then predict cryptic sites in the entire
structurally characterized human proteome (11,201 structures, covering
23% of all residues in the proteome). CryptoSite increases the size of
the potentially “druggable” human proteome from
~40% to ~78% of disease-associated proteins.
Finally, to demonstrate the utility of our approach in practice, we
experimentally validate a cryptic site in protein tyrosine phosphatase 1B using
a covalent ligand and NMR spectroscopy. The CryptoSite web server is available
at http://salilab.org/cryptosite.