Balancing accuracy and efficiency is a common problem
in molecular
simulation. This tradeoff is evident in coarse-grained molecular dynamics
simulation, which prioritizes efficiency, and all-atom molecular simulation,
which prioritizes accuracy. Despite continuous efforts, creating a
coarse-grained model that accurately captures both the system’s
structure and dynamics remains elusive. In this article, we present
a data-driven approach for constructing coarse-grained models that
aim to describe both the structure and dynamics of the system equally
well. While the development of machine learning models is well-received
in the scientific community, the significance of dataset creation
for these models is often overlooked. However, data-driven approaches
cannot progress without a robust dataset. To address this, we construct
a database of synthetic coarse-grained potentials generated from unphysical
all-atom models. A neural network is trained with the generated database
to predict the coarse-grained potentials of real liquids. We evaluate
their quality by calculating the combined loss of structural and dynamical
accuracy upon coarse-graining. When we compare our machine learning-based
coarse-grained potential with the one from iterative Boltzmann inversion,
the machine learning prediction turns out better for all eight hydrocarbon
liquids we studied. As all-atom surfaces turn more nonspherical, both
ways of coarse-graining degrade. Still, the neural network outperforms
iterative Boltzmann inversion in constructing good quality coarse-grained
models for such cases. The synthetic database and the developed machine
learning models are freely available to the community, and we believe
that our approach will generate interest in efficiently deriving accurate
coarse-grained models for liquids.