Conspectus
Teaching
computers to plan multistep syntheses of arbitrary target
moleculesincluding natural productshas been one of
the oldest challenges in chemistry, dating back to the 1960s. This
Account recapitulates two decades of our group’s work on the
software platform called Chematica, which very recently achieved this
long-sought objective and has been shown capable of planning synthetic
routes to complex natural products, several of which were validated
in the laboratory.
For the machine to plan syntheses at an expert
level, it must know
the rules describing chemical reactions and use these rules to expand
and search the networks of synthetic options. The rules must be of
high quality: They must delineate accurately the scope of admissible
substituents, capture all relevant stereochemical information, detect
potential reactivity conflicts, and protection requirements. They
should yield only those synthons that are chemically stable and energetically
allowed (e.g., not too strained) and should be able to extrapolate
beyond examples already published in the literature. In parallel,
the network-search algorithms must be able to assign meaningful scores
to the sets of synthons they encounter, make judicious choices which
of the network’s branches to expand, and when to withdraw from
unpromising ones. They must be able to strategize over multiple steps
to resolve intermittent reactivity conflicts, exchange functional
groups, or overcome local maxima of molecular complexity.
Meeting
all these requirements makes the problem of computer-driven
retrosynthesis very multifaceted, combining expert and AI approaches
further supplemented by quantum-mechanical and molecular-mechanics
calculations. Development of Chematica has been a very long and gradual
process because all these components are needed. Any shortcutsfor
example, reliance on only expert or only data-based approachesyield
chemically naïve and often erroneous syntheses, especially
for complex targets. On the bright side, once all the requisite algorithms
are implementedas they now arethey not only streamline
conventional synthetic planning but also enable completely new modalities
that would challenge any human chemist, for example, synthesis with
multiple constraints imposed simultaneously or library-wide syntheses
in which the machine constructs “global plans” leading
to multiple targets and benefiting from the use of common intermediates.
These types of analyses will have profound impact on the practice
of chemical industry, designing more economical, more green, and less
hazardous pathways.