In
structure-based virtual screening (SBVS), a binding site on a protein
structure is used to search for ligands with favorable nonbonded interactions.
Because it is computationally difficult, docking is time-consuming
and any docking user will eventually encounter a chemical library
that is too big to dock. This problem might arise because there is
not enough computing power or because preparing and storing so many
three-dimensional (3D) ligands requires too much space. In this study,
however, we show that quality regressors can be trained to predict
docking scores from molecular fingerprints. Although typical docking
has a screening rate of less than one ligand per second on one CPU
core, our regressors can predict about 5800 docking scores per second.
This approach allows us to focus docking on the portion of a database
that is predicted to have docking scores below a user-chosen threshold.
Herein, usage examples are shown, where only 25% of a ligand database
is docked, without any significant virtual screening performance loss.
We call this method “lean-docking”. To validate lean-docking,
a massive docking campaign using several state-of-the-art docking
software packages was undertaken on an unbiased data set, with only
wet-lab tested active and inactive molecules. Although regressors
allow the screening of a larger chemical space, even at a constant
docking power, it is also clear that significant progress in the virtual
screening power of docking scores is desirable.