A new diagnostic called the core consistency diagnostic (CORCONDIA) is suggested for determining the proper number of components for multiway models. It applies especially to the parallel factor analysis (PARAFAC) model, but also to other models that can be considered as restricted Tucker3 models. It is based on scrutinizing the`appropriateness' of the structural model based on the data and the estimated parameters of gradually augmented models. A PARAFAC model (employing dimension-wise combinations of components for all modes) is called appropriate if adding other combinations of the same components does not improve the fit considerably. It is proposed to choose the largest model that is still sufficiently appropriate. Using examples from a range of different types of data, it is shown that the core consistency diagnostic is an effective tool for determining the appropriate number of components in e.g. PARAFAC models. However, it is also shown, using simulated data, that the theoretical understanding of CORCONDIA is not yet complete.
This paper presents a standardized notation and terminology to be used for three‐ and multiway analyses, especially when these involve (variants of) the CANDECOMP/PARAFAC model and the Tucker model. The notation also deals with basic aspects such as symbols for different kinds of products, and terminology for three‐ and higher‐way data. The choices for terminology and symbols to be used have to some extent been based on earlier (informal) conventions. Simplicity and reduction of the possibility of confusion have also played a role in the choices made. Copyright © 2000 John Wiley & Sons, Ltd.
A common problem in exploratory factor analysis is how many factors need to be extracted from a particular data set. We propose a new method for selecting the number of major common factors: the Hull method, which aims to find a model with an optimal balance between model fit and number of parameters. We examine the performance of the method in an extensive simulation study in which the simulated data are based on major and minor factors. The study compares the method with four other methods such as parallel analysis and the minimum average partial test, which were selected because they have been proven to perform well and/or they are frequently used in applied research. The Hull method outperformed all four methods at recovering the correct number of major factors. Its usefulness was further illustrated by its assessment of the dimensionality of the Five-Factor Personality Inventory ( Hendriks, Hofstee, & De Raad, 1999 ). This inventory has 100 items, and the typical methods for assessing dimensionality prove to be useless: the large number of factors they suggest has no theoretical justification. The Hull method, however, suggested retaining the number of factors that the theoretical background to the inventory actually proposes.
PARAFAC is a generalization of principal component analysis (PCA) to the situation where a set of data matrices is to be analysed. If each data matrix has the same row and column units, the resulting data are three‐way data and can be modelled by the PARAFAC1 model. If each data matrix has the same column units but different (numbers of) row units, the PARAFAC2 model can be used. Like the PARAFAC1 model, the PARAFAC2 model gives unique solutions under certain mild assumptions, whereas it is less severely constrained than PARAFAC1. It may therefore also be used for regular three‐way data in situations where the PARAFAC1 model is too restricted. Usually the PARAFAC2 model is fitted to a set of matrices with cross‐products between the column units. However, this model‐fitting procedure is computationally complex and inefficient. In the present paper a procedure for fitting the PARAFAC2 model directly to the set of data matrices is proposed. It is shown that this algorithm is more efficient than the indirect fitting algorithm. Moreover, it is more easily adjusted so as to allow for constraints on the parameter matrices, to handle missing data, as well as to handle generalizations to sets of three‐ and higher‐way data. Furthermore, with the direct fitting approach we also gain information on the row units, in the form of ‘factor scores’. As will be shown, this elaboration of the model in no way limits the feasibility of the method. Even though full information on the row units becomes available, the algorithm is based on the usually much smaller cross‐product matrices only. Copyright © 1999 John Wiley & Sons, Ltd.
In regression, cross-validation is an effective and popular approach that is used to decide, for example, the number of underlying features, and to estimate the average prediction error. The basic principle of cross-validation is to leave out part of the data, build a model, and then predict the left-out samples. While such an approach can also be envisioned for component models such as principal component analysis (PCA), most current implementations do not comply with the essential requirement that the predictions should be independent of the entity being predicted. Further, these methods have not been properly reviewed in the literature. In this paper, we review the most commonly used generic PCA cross-validation schemes and assess how well they work in various scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.