Extended multiplicative signal correction (EMSC) is a widely used framework for preprocessing spectral data. In the EMSC framework, spectra are scaled according to a given reference spectrum. Spectra that are far from collinear with the selected reference spectrum may not be scaled appropriately. An extension of the EMSC framework that allows for the incorporation of multiple reference spectra in the EMSC model is proposed to remedy this issue. Useful candidate reference spectra can be obtained from the dominant right singular vectors associated with the matrix of spectra, but any desired reference spectra can be used.As a part of this extension, we propose to change the basis used in the EMSC preprocessing to an orthonormal basis. Using an orthonormal basis will remove confounding issues between the basis vectors and make the obtained EMSC model simpler to interpret. We discuss the proposed modification theoretically and demonstrate its use with two data sets of Raman spectra and modelling with partial least quares regression and Tikhonov regularization. The data sets used are Raman spectra of oil samples from salmon with iodine value as the response and Raman spectra of an emulsion of water, whey protein, and different oils with polyunsaturated fatty acids as response (both as percentage of total fat content and total weight).
Multiway datasets arise in various situations, typically from specialised measurement technologies, as a result of measuring data over varying conditions in multiple dimensions or simply as sets of possibly multichannel images. When such measurements are intended for predicting some external properties, the amount of methods available is limited. The multilinear partial least squares (PLS) is among the few available options. In the present work, we generalise the canonical partial least squares framework to handle multiway data. We demonstrate the resulting multiway data analysis method to be capable of building parsimonious models, encompassing continuous and categorical responses—both single and multiple—in a unifying framework. This also enables inclusion of additional responses/information that can contribute to more parsimonious models. Finally, we achieve a considerable advantage in computational speed without sacrificing numerical precision by deflating the responses and orthogonalising scores rather than the more costly deflations of the predictor data.
Spectroscopic data are usually perturbed by noise from various sources that should be removed prior to model calibration. After conducting a preprocessing step to eliminate unwanted multiplicative effects (effects that scale the pure signal in a multiplicative manner), we discuss how to correct a model for unwanted additive effects in the spectra. Our approach is described within the Tikhonov regularization (TR) framework for linear regression model building, and our focus is on ignoring the influence of noninformative polynomial trends. This is obtained by including an additional criterion in the TR problem penalizing the resulting regression coefficients away from a selected set of possibly disturbing directions in the sample space. The presented method builds on the extended multiplicative signal correction, and we compare the two approaches on several real data sets showing that the suggested TR-based method may improve the predictive power of the resulting model. We discuss the possibilities of imposing smoothness in the calculation of regression coefficients as well as imposing selection of wavelength regions within the TR framework. To implement TR efficiently in the model building, we use an algorithm that is heavily based on the singular value decomposition. Because of some favorable properties of the singular value decomposition, it is possible to explore the models (including their generalized cross-validation error estimates) associated with a large number of regularization parameter values at low computational cost.
In various situations requiring empirical model building from highly multivariate measurements, modelling based on partial least squares regression (PLSR) may often provide efficient low‐dimensional model solutions. In unsupervised situations, the same may be true for principal component analysis (PCA). In both cases, however, it is also of interest to identify subsets of the measured variables useful for obtaining sparser but still comparable models without significant loss of information and performance. In the present paper, we propose a voting approach for sparse overall maximisation of variance analogous to PCA and a similar alternative for deriving sparse regression models influenced closely related to the PLSR method. Both cases yield pivoting strategies for a modified Gram–Schmidt process and its corresponding (partial) QR‐factorisation of the underlying data matrix to manage the variable selection process. The proposed methods include score and loading plot possibilities that are acknowledged for providing efficient interpretations of the related PCA and PLS models in chemometric applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.