Unsupervised learning
is becoming an essential tool to analyze
the increasingly large amounts of data produced by atomistic and molecular
simulations, in material science, solid state physics, biophysics,
and biochemistry. In this Review, we provide a comprehensive overview
of the methods of unsupervised learning that have been most commonly
used to investigate simulation data and indicate likely directions
for further developments in the field. In particular, we discuss
feature representation
of molecular systems and present
state-of-the-art algorithms of
dimensionality reduction
,
density estimation
, and
clustering
, and
kinetic models
. We divide our discussion into
self-contained sections, each discussing a specific method. In each
section, we briefly touch upon the mathematical and algorithmic foundations
of the method, highlight its strengths and limitations, and describe
the specific ways in which it has been used-or can be used-to analyze
molecular simulation data.