<p>Computer
Assisted Synthesis Planning (CASP) has gained considerable interest as of late.
Herein we investigate a template-based retrosynthetic planning tool, trained on
a variety of datasets consisting of up to 17.5 million reactions. We
demonstrate that models trained on datasets such as internal Electronic
Laboratory Notebooks (ELN), and the publicly available United States Patent Office
(USPTO) extracts, are sufficient for the prediction of full synthetic routes to
compounds of interest in medicinal chemistry. As such we have assessed the
models on 1,731 compounds from 41 virtual libraries for which experimental
results were known. Furthermore, we show that accuracy is a misleading metric
for assessment of the ‘filter network’, and propose that the number of successfully
applied templates, in conjunction with the overall ability to generate full
synthetic routes be examined instead. To this end we found that the specificity
of the templates comes at the cost of generalizability, and overall model
performance. This is supplemented by a comparison of the underlying datasets
and their corresponding models.</p>