Reaction
condition recommendation is an essential element for the
realization of computer-assisted synthetic planning. Accurate suggestions
of reaction conditions are required for experimental validation and
can have a significant effect on the success or failure of an attempted
transformation. However, de novo condition recommendation remains
a challenging and under-explored problem and relies heavily on chemists’
knowledge and experience. In this work, we develop a neural-network
model to predict the chemical context (catalyst(s), solvent(s), reagent(s)),
as well as the temperature most suitable for any particular organic
reaction. Trained on ∼10 million examples from Reaxys, the
model is able to propose conditions where a close match to the recorded
catalyst, solvent, and reagent is found within the top-10 predictions
69.6% of the time, with top-10 accuracies for individual species reaching
80–90%. Temperature is accurately predicted within ±20
°C from the recorded temperature in 60–70% of test cases,
with higher accuracy for cases with correct chemical context predictions.
The utility of the model is illustrated through several examples spanning
a range of common reaction classes. We also demonstrate that the model
implicitly learns a continuous numerical embedding of solvent and
reagent species that captures their functional similarity.
Artificial
intelligence and machine learning have demonstrated
their potential role in predictive chemistry and synthetic planning
of small molecules; there are at least a few reports of companies
employing
in silico
synthetic planning into their
overall approach to accessing target molecules. A data-driven synthesis
planning program is one component being developed and evaluated by
the Machine Learning for Pharmaceutical Discovery and Synthesis (MLPDS)
consortium, comprising MIT and 13 chemical and pharmaceutical company
members. Together, we wrote this perspective to share how we think
predictive models can be integrated into medicinal chemistry synthesis
workflows, how they are currently used within MLPDS member companies,
and the outlook for this field.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.