<div>
<div>
<div>
<p>The efficient selection of representative configurations that are used in high-level electronic structure calculations needed for the development of many-body molecular models
poses a challenge to current data-driven approaches to molecular simulations. Here, we
introduce an active learning (AL) framework for generating training sets corresponding to
individual many-body contributions to the energy of a N-body system, which are required
for the development of MB-nrg potential energy functions (PEFs). Our AL framework is
based on uncertainty and error estimation, and uses Gaussian process regression (GPR)
to identify the most relevant configurations that are needed for an accurate representation
of the energy landscape of the molecular system under exam. Taking the Cs<sup>+</sup>–water system as a case study, we demonstrate that the application of our AL framework results in
significantly smaller training sets than previously used in the development of the original
MB-nrg PEF, without loss of accuracy. Considering the computational cost associated with
high-level electronic structure calculations for training set configurations, our AL framework is particularly well-suited to the development of many-body PEFs, with chemical and
spectroscopic accuracy, for molecular simulations from the gas to condensed phase.
</p>
</div>
</div>
</div>