Machine learning has shown enormous
potential for computer-aided
drug discovery. Here we show how modern convolutional neural networks
(CNNs) can be applied to structure-based virtual screening. We have
coupled our densely connected CNN (DenseNet) with a transfer learning
approach which we use to produce an ensemble of protein family-specific
models. We conduct an in-depth empirical study and provide the first
guidelines on the minimum requirements for adopting a protein family-specific
model. Our method also highlights the need for additional data, even
in data-rich protein families. Our approach outperforms recent benchmarks
on the DUD-E data set and an independent test set constructed from
the ChEMBL database. Using a clustered cross-validation on DUD-E,
we achieve an average AUC ROC of 0.92 and a 0.5% ROC enrichment factor
of 79. This represents an improvement in early enrichment of over
75% compared to a recent machine learning benchmark. Our results demonstrate
that the continued improvements in machine learning architecture for
computer vision apply to structure-based virtual screening.