Gaussian Mixture Models (GMMs) are one of the most widespread methodologies for model-based clustering. They assume a multivariate Gaussian distribution for each component of the mixture, centered at the mean vector and with volume, shape and orientation derived by the covariance matrix. To reduce the large number of parameters produced by the covariance matrices, parsimonious parameterizations of the latter were proposed in literature, e.g., the eigen-decomposition and the parsimonious GMMs based on mixtures of probabilistic principal component analyzers and mixtures of factor analyzers. We introduce a new parameterization of a covariance matrix by defining an extended ultrametric covariance matrix and we implement it into a GMM. This structure can be used to describe multidimensional phenomena which are characterized by nested latent concepts having different levels of abstraction, from the most specific to the most general. The proposal is able to pinpoint a hierarchical structure on variables for each component of the GMM, thus identifying a different characterization of a multidimensional phenomenon for each component (cluster, subpopulation) of the mixture. At the same time, it defines a new parsimonious GMM since the ultrametric covariance structure reconstructs the relationships among variables with a limited number of parameters. The proposal is applied on synthetic and real data. On the former it shows good performance in terms of classification when compared to the other existing parameterizations, and on the latter it also provides insight into the hierarchical relationships among the variables for each cluster.
Dimension reduction, by means of Principal Component Analysis (PCA), is often employed to obtain a reduced set of components preserving the largest possible part of the total variance of the observed variables. Several methodologies have been proposed either to improve the interpretation of PCA results (e.g., by means of orthogonal, oblique rotations, shrinkage methods), or to model oblique components or factors with a hierarchical structure, such as in Bi-factor and High-Order Factor analyses. In this paper, we propose a new methodology, called Hierarchical Disjoint Principal Component Analysis (HierDPCA), that aims at building a hierarchy of disjoint principal components of maximum variance associated with disjoint groups of observed variables, from Q up to a unique, general one. HierDPCA also allows choosing the type of the relationship among disjoint principal components of two sequential levels, from the lowest upwards, by testing the component correlation per level and changing from a reflective to a formative approach when this correlation turns out to be not statistically significant. The methodology is formulated in a semiparametric least-squares framework and a coordinate descent algorithm is proposed to estimate the model parameters. A simulation study and two real applications are illustrated to highlight the empirical properties of the proposed methodology.
A Composite Indicator (CI) is a useful tool to synthesize information on a multidimensional phenomenon and make policy decisions. Multidimensional phenomena are often modeled by hierarchical latent structures that reconstruct relationships between variables. In this paper, we propose an exploratory, simultaneous model for building a hierarchical CI system to synthesize a multidimensional phenomenon and analyze its several facets. The proposal, called the Ultrametric Composite Indicator (UCI) model, reconstructs the hierarchical relationships among manifest variables detected by the correlation matrix via an extended ultrametric correlation matrix. The latter has the feature of being one-to-one associated with a hierarchy of latent concepts. Furthermore, the proposal introduces a test to unravel relevant dimensions in the hierarchy and retain statistically significant higher-level CIs. A simulation study is illustrated to compare the proposal with other existing methodologies. Finally, the UCI model is applied to study Italian municipalities’ behavior toward waste management and to provide a tool to guide their councils in policy decisions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.