In psychology, many studies measure the same variables in different groups. In the case of a large number of variables when a strong a priori idea about the underlying latent construct is lacking, researchers often start by reducing the variables to a few principal components in an exploratory way. Herewith, one often wants to evaluate whether the components represent the same construct in the different groups. To this end, it makes sense to remove outlying variables that have significantly different loadings on the extracted components across the groups, hampering equivalent interpretations of the components. Moreover, identifying such outlying variables is important when testing theories about which variables behave similarly or differently across groups. In this article, we first scrutinize the lower bound congruence method (LBCM; De Roover, Timmerman, & Ceulemans in Behavior Research Methods, 49, 216-229, 2017), which was recently proposed for solving the outlying-variable detection problem. LBCM investigates how Tucker's congruence between the loadings of the obtained cluster-loading matrices improves when specific variables are discarded. We show that LBCM has the tendency to output outlying variables that either are false positives or concern very small, and thus practically insignificant, loading differences. To address this issue, we present a new heuristic: the lower and resampled upper bound congruence method (LRUBCM). This method uses a resampling technique to obtain a sampling distribution for the congruence coefficient, under the hypothesis that no outlying variable is present. In a simulation study, we show that LRUBCM outperforms LBCM. Finally, we illustrate the use of the method by means of empirical data.
Principal covariates regression (PCovR) allows to deal with the interpretational and technical problems associated with running ordinary regression using many predictor variables. In PCovR, the predictor variables are reduced to a limited number of components, and simultaneously, criterion variables are regressed on these components. By means of a weighting parameter, users can flexibly choose how much they want to emphasize reconstruction and prediction. However, when datasets contain many criterion variables, PCovR users face new interpretational problems, because many regression weights will be obtained and because some criteria might be unrelated to the predictors. We therefore propose PCovR2, which extends PCovR by also reducing the criteria to a few components.These criterion components are predicted based on the predictor components. The PCovR2 weighting parameter can again be flexibly used to focus on the reconstruction of the predictors and criteria, or on filtering out relevant predictor components and predictable criterion components. We compare PCovR2 to two other approaches, based on Partial Least Squares (PLS) and Principal Components Regression (PCR), that also reduce the criteria and are therefore called PLS2 and PCR2. By means of a simulated example, we show that PCovR2 outperforms PLS2 and PCR2 when one aims to recover all relevant predictor components and predictable criterion components. Moreover, we conduct a simulation study to evaluate how well PCovR2, PLS2 and PCR2 succeed in finding (1) all underlying components and (2) the subset of relevant predictor and predictable criterion components.Finally, we illustrate the use of PCovR2 by means of empirical data.
Multivariate multigroup data are collected in many fields of science, where the so‐called groups pertain to, for instance, experimental groups or countries the participants are nested in. To summarize the main information in such data, principal component analysis (PCA) is highly popular. PCA reduces the variables to a few components that are linear combinations of the original variables. Researchers usually assume those components to be the same across the groups and aim to apply a simultaneous component analysis. To investigate whether this assumption is reasonable, one often analyzes the groups separately and computes a similarity index between the group‐specific component loadings of the variables. In many cases, however, most variables have highly similar loadings across the groups, but a few variables, which we will call “outlying variables,” behave differently, indicating that a simultaneous analysis is not warranted. In such cases, the outlying variables should be removed before proceeding with the simultaneous analysis. To do so, the variables are ranked according to their relative outlyingness. Although some procedures have been proposed that yield such an outlyingness ranking, they might not be optimal, because they all rely on the same choice of similarity coefficient without evaluating other alternatives. In this paper, we give an overview of other options and report extensive simulations in which we investigate how this choice affects the correctness of the outlyingness ranking. We also illustrate the added value of the outlying variable approach by means of sensometric data on different bread samples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.