Ultra-high performance
liquid chromatography coupled to ion mobility
separation and high-resolution mass spectrometry instruments have
proven very valuable for screening of emerging contaminants in the
aquatic environment. However, when applying suspect or nontarget approaches
(i.e., when no reference standards are available),
there is no information on retention time (RT) and collision cross-section
(CCS) values to facilitate identification. In silico prediction tools
of RT and CCS can therefore be of great utility to decrease the number
of candidates to investigate. In this work, Multiple Adaptive Regression
Splines (MARS) were evaluated for the prediction of both RT and CCS.
MARS prediction models were developed and validated using a database
of 477 protonated molecules, 169 deprotonated molecules, and 249 sodium
adducts. Multivariate and univariate models were evaluated showing
a better fit for univariate models to the experimental data. The RT
model (R
2 = 0.855) showed a deviation
between predicted and experimental data of ±2.32 min (95% confidence
intervals). The deviation observed for CCS data of protonated molecules
using the CCSH model (R
2 =
0.966) was ±4.05% with 95% confidence intervals. The CCSH model was also tested for the prediction of deprotonated
molecules, resulting in deviations below ±5.86% for the 95% of
the cases. Finally, a third model was developed for sodium adducts
(CCSNa, R
2 = 0.954) with deviation
below ±5.25% for 95% of the cases. The developed models have
been incorporated in an open-access and user-friendly online platform
which represents a great advantage for third-party research laboratories
for predicting both RT and CCS data.