Formation and growth of atmospheric molecular clusters
into aerosol
particles impact the global climate and contribute to the high uncertainty
in modern climate models. Cluster formation is usually studied using
quantum chemical methods, which quickly becomes computationally expensive
when system sizes grow. In this work, we present a large database
of ∼250k atmospheric relevant cluster structures, which can
be applied for developing machine learning (ML) models. The database
is used to train the ML model kernel ridge regression (KRR) with the
FCHL19 representation. We test the ability of the model to extrapolate
from smaller clusters to larger clusters, between different molecules,
between equilibrium structures and out-of-equilibrium structures,
and the transferability onto systems with new interactions. We show
that KRR models can extrapolate to larger sizes and transfer acid
and base interactions with mean absolute errors below 1 kcal/mol.
We suggest introducing an iterative ML step in configurational sampling
processes, which can reduce the computational expense. Such an approach
would allow us to study significantly more cluster systems at higher
accuracy than previously possible and thereby allow us to cover a
much larger part of relevant atmospheric compounds.