With the development of technology and the relatively higher availability of new instrumentations, having multiblock data sets (eg, a set of samples analyzed by different analytical techniques) is becoming more and more common and, as a consequence, how to handle this kind of outcomes is a widely discussed topic. In such a context, where the number of involved variables is relatively high, selecting the most significant features is obviously relevant.For this reason, the possibility of joining a multiblock regression method, the sequential and orthogonalized partial least-squares (SO-PLS), with a variable selection approach called covariance selection (CovSel), has been investigated. The resulting method, sequential and orthogonalized covariance selection (SO-CovSel) is similar to SO-PLS, but the feature reduction provided by PLS is performed by CovSel. Finally, predictions are made by applying multiple linear regression on the subset of selected variables. The novel approach has been tested on different multiblock data sets both in regression and in classification (by combination with LDA), and it has been compared with another state-ofthe-art multiblock method. SO-CovSel has demonstrated to be suitable for its purpose: It has provided good predictions (both in regression and in classification) and, from the interpretation point of view, it has led to a meaningful selection of the original variables. variables in this framework has not widely been discussed yet. In the present paper, the possibility of plugging together a multiblock latent variable-based method, sequential and orthogonalized partial least-squares (SO-PLS), 4 and a variable selection method, covariance selection (CovSel), 5 has been examined, and a novel multiblock variable selection method, named SO-CovSel, has been developed.Among the many possible methodologies which have been proposed in literature, CovSel and SO-PLS have been chosen because they have demonstrated to lead to good predictions and, at the same time, both of them present some peculiarities which make them particularly suitable for the interpretation of complex systems.Analogously to SO-PLSR and SO-PLS-LDA, 6 SO-CovSel finds its application in different contexts; it can be combined with regression or with linear discriminant analysis (LDA). 7 In both contexts, it has led to selections of variables exceptionally suitable for the interpretation of the systems under study, and it has provided predictions which are comparable with those obtained applying another state-of-the-art method.
| MATERIAL AND METHODS
| Covariance selectionCovSel 5 is a variable selection method conceived to detect the most relevant features in a regression context. As the name suggests, the relevancy of each variable is evaluated estimating the covariance between each predictor and responses. Once the variable presenting the highest covariance is individuated and selected, all the other predictors and the responses are orthogonalized with respect to it, and the procedure is repeated until the fixed amount of...