This
Perspective outlines recent progress and future directions
for using machine learning (ML), a data-driven method, to address
critical questions in the design, synthesis, processing, and characterization
of biomacromolecules. The achievement of these tasks
requires the navigation of vast and complex chemical and biological
spaces, difficult to accomplish with reasonable speed. Using modern
algorithms and supercomputers, quantum physics methods are able to
examine systems containing a few hundred interacting species and determine
the probability of finding them in a particular region of phase space,
thereby anticipating their properties. Likewise, modern approaches
in chemistry and biomolecular simulation, supported by high performance
computing, have culminated in producing data sets of escalating size
and intrinsically high complexity. Hence, using ML to extract relevant
information from these fields is of paramount importance to advance
our understanding of chemical and biomolecular systems. At the heart
of ML approaches lie statistical algorithms, which by evaluating a
portion of a given data set, identify, learn, and
manipulate the underlying rules that govern the whole data set. The
assembly of a quality model to represent the data followed by the
predictions and elimination of error sources are the key steps in
ML. In addition to a growing infrastructure of ML tools to address
complex problems, an increasing number of aspects related to our understanding
of the fundamental properties of biomacromolecules are exposed to
ML. These fields, including those residing at the interface of polymer
science and biology (i.e., structure determination, de novo design,
folding, and dynamics), strive to adopt and take advantage of the
transformative power offered by approaches in the ML domain, which
clearly has the potential of accelerating research in the field of
biomacromolecules.