The important role of finite mixture models in the statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the statistical and general scientific literature. The aim of this article is to provide an up-to-date account of the theory and methodological developments underlying the applications of finite mixture models. Because of their flexibility, mixture models are being increasingly exploited as a convenient, semiparametric way in which to model unknown distributional shapes. This is in addition to their obvious applications where there is group-structure in the data or where the aim is to explore the data for such structure, as in a cluster analysis. It has now been three decades since the publication of the monograph by McLachlan & Basford (1988) with an emphasis on the potential usefulness of mixture models for inference and clustering. Since then, mixture models have attracted the interest of many researchers and have found many new and interesting fields of application. Thus, the literature on mixture models has expanded enormously, and as a consequence, the bibliography here can only provide selected coverage.
Finite mixtures of skew distributions have emerged as an effective tool in modelling heterogeneous data with asymmetric features. With various proposals appearing rapidly in the recent years, which are similar but not identical, the connections between them and their relative performance becomes rather unclear. This paper aims to provide a concise overview of these developments by presenting a systematic classification of the existing skew symmetric distributions into four types, thereby clarifying their close relationships. This also aids in understanding the link between some of the proposed expectation-maximization (EM) based algorithms for the computation of the maximum likelihood estimates of the parameters of the models. The final part of this paper presents an illustration of the performance of these mixture models in clustering a real dataset, relative to other non-elliptically contoured clustering methods and associated algorithms for their implementation.
In biomedical applications, an experimenter encounters different potential sources of variation in data such as individual samples, multiple experimental conditions, and multivariate responses of a panel of markers such as from a signaling network. In multiparametric cytometry, which is often used for analyzing patient samples, such issues are critical. While computational methods can identify cell populations in individual samples, without the ability to automatically match them across samples, it is difficult to compare and characterize the populations in typical experiments, such as those responding to various stimulations or distinctive of particular patients or time-points, especially when there are many samples. Joint Clustering and Matching (JCM) is a multi-level framework for simultaneous modeling and registration of populations across a cohort. JCM models every population with a robust multivariate probability distribution. Simultaneously, JCM fits a random-effects model to construct an overall batch template – used for registering populations across samples, and classifying new samples. By tackling systems-level variation, JCM supports practical biomedical applications involving large cohorts. Software for fitting the JCM models have been implemented in an R package EMMIX-JCM, available from http://www.maths.uq.edu.au/~gjm/mix_soft/EMMIX-JCM/.
The mixture of factor analyzers (MFA) model provides a powerful tool for analyzing high-dimensional data as it can reduce the number of free parameters through its factor-analytic representation of the component covariance matrices. This paper extends the MFA model to incorporate a restricted version of the multivariate skew-normal distribution to model the distribution of the latent component factors, called mixtures of skew-normal factor analyzers (MSNFA). The proposed MSNFA model allows us to relax the need for the normality assumption for the latent factors in order to accommodate skewness in the observed data. The MSNFA model thus provides an approach to model-based density estimation and clustering of high-dimensional data exhibiting asymmetric characteristics. A computationally feasible ECM algorithm is developed for computing the maximum likelihood estimates of the parameters. Model selection can be made on the basis of three commonly used information-based criteria. The potential of the proposed methodology is exemplified through applications to two real examples, and the results are compared with those obtained from fitting the MFA model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.