Initially part of the field of artificial intelligence,
machine
learning (ML) has become a booming research area since branching out
into its own field in the 1990s. After three decades of refinement,
ML algorithms have accelerated scientific developments across a variety
of research topics. The field of small molecule design is no exception,
and an increasing number of researchers are applying ML techniques
in their pursuit of discovering, generating, and optimizing small
molecule compounds. The goal of this review is to provide simple,
yet descriptive, explanations of some of the most commonly utilized
ML algorithms in the field of small molecule design along with those
that are highly applicable to an experimentally focused audience.
The algorithms discussed here span across three ML paradigms: supervised
learning, unsupervised learning, and ensemble methods. Examples from
the published literature will be provided for each algorithm. Some
common pitfalls of applying ML to biological and chemical data sets
will also be explained, alongside a brief summary of a few more advanced
paradigms, including reinforcement learning and semi-supervised learning.