Machine learning has had a significant impact on the
value of spectroscopic
characterization tools, particularly in biomedical applications, due
to its ability to detect latent patterns within complex spectral data.
However, it often requires extensive data preprocessing, including
baseline correction and denoising, which can lead to an unintentional
bias during classification. To address this, we developed two deep
learning methods capable of fully preprocessing raw Raman spectroscopy
data without any human input. First, cascaded deep convolutional neural
networks (CNN) based on either ResNet or U-Net architectures were
trained on randomly generated spectra with augmented defects. Then,
they were tested using simulated Raman spectra, surface-enhanced Raman
spectroscopy (SERS) imaging of chemical species, low resolution Raman
spectra of human bladder cancer tissue, and finally, classification
of SERS spectra from human placental extracellular vesicles (EVs).
Both approaches resulted in faster training and complete spectral
preprocessing in a single step, with more speed, defect tolerance,
and classification accuracy compared to conventional methods. These
findings indicate that cascaded CNN preprocessing is ideal for biomedical
Raman spectroscopy applications in which large numbers of heterogeneous
spectra with diverse defects need to be automatically, rapidly, and
reproducibly preprocessed.