The theoretical prediction
of molecular electronic spectra by means
of quantum mechanical (QM) computations is fundamental to gain a deep
insight into many photophysical and photochemical processes. A computational
strategy that is attracting significant attention is the so-called
Nuclear Ensemble Approach (NEA), that relies on generating a representative
ensemble of nuclear geometries around the equilibrium structure and
computing the vertical excitation energies (Δ
E
) and oscillator strengths (
f
) and
phenomenologically
broadening
each transition with a line-shaped function with
empirical full-width δ. Frequently, the choice of δ is
carried out by visually finding the trade-off between artificial vibronic
features (small δ) and over-smoothing of electronic signatures
(large δ). Nevertheless, this approach is not satisfactory,
as it relies on a subjective perception and may lead to spectral inaccuracies
overall when the number of sampled configurations is limited due to
an excessive computational burden (high-level QM methods, complex
systems, solvent effects, etc.). In this work, we have developed and
tested a new approach to reconstruct NEA spectra, dubbed GMM-NEA,
based on the use of Gaussian Mixture Models (GMMs), a probabilistic
machine learning algorithm, that circumvents the phenomenological
broadening assumption and, in turn, the use of δ altogether.
We show that GMM-NEA systematically outperforms other data-driven
models to automatically select δ overall for small datasets.
In addition, we report the use of an algorithm to detect anomalous
QM computations (outliers) that can affect the overall shape and uncertainty
of the NEA spectra. Finally, we apply GMM-NEA to predict the photolysis
rate for HgBrOOH, a compound involved in Earth’s atmospheric
chemistry.