Soil is commonly collected from an outdoor crime scene, and thus it is helpful in linking a suspect and a victim to a crime scene. The chemical profiles of soils can be acquired via chemical instruments such as Ultra-Performance Liquid Chromatography (UPLC). However, the UPLC chromatogram often interferes with an unstable baseline. In this paper, we compared the performance of five baseline correction (BC) algorithms, i.e., asymmetric least squares, fill peak (FP), iterative restricted least squares, median window (MW), and modified polynomial fitting, in discriminating 30 chromatograms of brownish soils by five locations of origin, i.e., PP, HK, KU, BL and KB. The performances of the preprocessed sub-datasets were first visually inspected through the mean chromatograms and then further explored via scores plots of principal component analysis. Eventually, the predictive performances of the PLS-DA models estimated from 1000 pairs of training and testing samples (i.e., prepared via iterative random resampling split at 75:25) were studied to identify the best BC method. Mean raw chromatograms of the ten soil samples were different from each other, with evident fluctuated baselines. AsLS and MW corrected chromatograms demonstrated the most significant improvement compared to the raw counterpart. Meanwhile, the scores plot of PCA revealed that most of the sub-datasets produced three separate clusters. Then, the sub-datasets were modelled via the partial least squares-discriminant analysis (PLS-DA) technique. MW emerged as the excellent BC method based on the mean prediction accuracy estimated using 1000 pairs of training and testing samples. In conclusion, MW outperformed the other BC methods in correcting the UPLC data of soil.
Key points