Caffeine, quinic acid, and nicotinic acid are among the
significant
chemical determinants of coffee quality. This study develops a chemometric
model to quantify these compounds in ternary mixtures analyzed by
terahertz time-domain spectroscopy (THz-TDS). A data set of 480 THz
spectra was obtained from 80 samples. Combinations of data preprocessing
methods, including normalization (Z-score, min-max
scaling, Mie baseline removal) and dimensionality reduction (principal
component analysis (PCA), factor analysis (FA), independent component
analysis (ICA), locally linear embedding (LLE), non-negative matrix
factorization (NMF), isomap), and prediction models (partial least-squares
regression (PLSR), support vector regression (SVR), multilayer perceptron
(MLP), convolutional neural network (CNN), gradient boosting) were
analyzed for their prediction performance (totaling to 4,711,685 combinations).
Results show that the highest quantification performance was achieved
at a root-mean-square error of prediction (RMSEP) of 0.0254 (dimensionless
mass ratio), using min-max scaling and factor analysis for data preprocessing
and multilayer perceptron for prediction. Effects of preprocessing,
comparison of prediction models, and linearity of data are discussed.