Severe
acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has
caused the worst global health crisis in living memory. The reverse
transcription polymerase chain reaction (RT-qPCR) is considered the
gold standard diagnostic method, but it exhibits limitations in the
face of enormous demands. We evaluated a mid-infrared (MIR) data set
of 237 saliva samples obtained from symptomatic patients (138 COVID-19
infections diagnosed via RT-qPCR). MIR spectra were evaluated via
unsupervised random forest (URF) and classification models. Linear
discriminant analysis (LDA) was applied following the genetic algorithm
(GA-LDA), successive projection algorithm (SPA-LDA), partial least
squares (PLS-DA), and a combination of dimension reduction and variable
selection methods by particle swarm optimization (PSO-PLS-DA). Additionally,
a consensus class was used. URF models can identify structures even
in highly complex data. Individual models performed well, but the
consensus class improved the validation performance to 85% accuracy,
93% sensitivity, 83% specificity, and a Matthew’s correlation
coefficient value of 0.69, with information at different spectral
regions. Therefore, through this unsupervised and supervised framework
methodology, it is possible to better highlight the spectral regions
associated with positive samples, including lipid (∼1700 cm
–1
), protein (∼1400 cm
–1
),
and nucleic acid (∼1200–950 cm
–1
)
regions. This methodology presents an important tool for a fast, noninvasive
diagnostic technique, reducing costs and allowing for risk reduction
strategies.