Applications of machine learning in chemistry are many
and varied,
from prediction of structure–property relationships, to modeling
of potential energy surfaces for large scale atomistic simulations.
We describe a generalized approach for the application of machine
learning to the classification of spectra which can be used as the
basis for a wide variety of undergraduate projects. While our examples
use FTIR and mass spectra, the approach could equally well be used
with UV–visible, Raman, NMR, or indeed any other type of spectra.
We summarize a number of different unsupervised and supervised machine
learning algorithms that can be used to classify spectra into groups,
and illustrate their application using data from three different projects
carried out by fourth year chemistry undergraduates. The three projects
investigated the ability of the various machine learning approaches
to correctly classify spectra of a variety of fruits, whiskies, and
teas, respectively. In all cases the algorithms were able to differentiate
between the various samples used in each study, and the trained machine
learning models could then be used to classify unknown samples with
a high degree of accuracy (>98% in many cases). Depending on the
extent
to which students are expected to write their own code to perform
the data analysis, the general model adopted in this work can be adapted
for a variety of purposes, from short (one to two day) practical exercises
and workshops, to much longer independent student projects.