Finite mixture models are being used increasingly to model a wide variety of random phenomena for clustering, classification and density estimation. mclust is a powerful and popular package which allows modelling of data as a Gaussian finite mixture with different covariance structures and different numbers of mixture components, for a variety of purposes of analysis. Recently, version 5 of the package has been made available on CRAN. This updated version adds new covariance structures, dimension reduction capabilities for visualisation, model selection criteria, initialisation strategies for the EM algorithm, and bootstrap-based inference, making it a full-featured R package for data analysis via finite mixture modelling.
Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the modelbased clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for variable selection in model-based clustering. Existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples. Fop, M. and Murphy, T. B./Variable Selection Methods 2 Fop, M. and Murphy, T. B./Variable Selection MethodsFig 1: Local and global independence assumptions. In the example, z is the group membership variable, X1, X2 and X3 are relevant clustering variables, while X4 and X5 are irrelevant and not related to z. Under the local independence assumption there are no edges among the relevant variables. Under the global independence assumption there is no edge between the set of relevant variables and the set of irrelevant ones.
The identification of most relevant clinical criteria related to low back pain disorders may aid the evaluation of the nature of pain suffered in a way that usefully informs patient assessment and treatment. Data concerning low back pain can be of categorical nature, in form of check-list in which each item denotes presence or absence of a clinical condition. Latent class analysis is a model-based clustering method for multivariate categorical responses which can be applied to such data for a preliminary diagnosis of the type of pain. In this work we propose a variable selection method for latent class analysis applied to the selection of the most useful variables in detecting the group structure in the data. The method is based on the comparison of two different models and allows the discarding of those variables with no group information and those variables carrying the same information as the already selected ones. We consider a swap-stepwise algorithm where at each step the models are compared through an approximation to their Bayes factor. The method is applied to the selection of the clinical criteria most useful for the clustering of patients in different classes. It is shown to perform a parsimonious variable selection and to give a clustering performance comparable to the expert-based classification of patients into three classes of pain.
Finite Gaussian mixture models are widely used for model-based clustering of continuous data. Nevertheless, since the number of model parameters scales quadratically with the number of variables, these models can be easily over-parameterized. For this reason, parsimonious models have been developed via covariance matrix decompositions or assuming local independence. However, these remedies do not allow for direct estimation of sparse covariance matrices nor do they take into account that the structure of association among the variables can vary from one cluster to the other. To this end, we introduce mixtures of Gaussian covariance graph models for model-based clustering with sparse covariance matrices. A penalized likelihood approach is employed for estimation and a general penalty term on the graph configurations can be used to induce different levels of sparsity and incorporate prior knowledge. Model estimation is carried out using a structural-EM algorithm for parameters and graph structure estimation, where two alternative strategies based on a genetic algorithm and an efficient stepwise search are proposed for inference. With this approach, sparse component covariance matrices are directly obtained. The framework results in a parsimonious model-based clustering of the data via a flexible model for the within-group joint distribution of the variables. Extensive simulated data experiments and application to illustrative datasets show that the method attains good classification performance and model quality.
Objectives: Lower extremity (LE) injuries are common in Gaelic games and lead to a significant economic and injury burden. Balance is considered a predictor of injury in other sports, however no research has examined its effect on LE injury in Gaelic games. This study aims to present normative data for the Y Balance Test (YBT), determine whether the YBT can identify those at risk of contact and non-contact LE and ankle injuries and generate population specific cutoff points in adolescent and collegiate Gaelic games. Design: Prospective cohort study. Methods: A convenience sample of 636 male adolescent (n = 293, age = 15.7 ± 0.7 years) and collegiate (n = 343, age = 19.3 ± 1.9 years) Gaelic footballers and hurlers were recruited. The YBT was completed and injuries were assessed at least weekly over one season. Univariate and logistic regression was performed to examine if the YBT can classify those at risk of LE-combined and ankle injuries. ROC curves were used to identify cutoff points. Results: Gaelic players performed poorly in the YBT and between 31-57% of all players were identified as at risk of injury at pre-season using previously published YBT cutoff points. However, poor YBT scores were unable to ascertain those at risk of contact or non-contact LE-combined and ankle injuries with sufficient sensitivity. High specificity was noted for contact LE-combined and non-contact ankle injuries. Conclusions: The YBT as a sole screening method to classify those at risk of LE and ankle injuries in Gaelic games is questionable. However, the YBT may be a useful preliminary screening tool to identify those not at risk of contact LE-combined or non-contact ankle injury. Generalising published cutoff points from other sports is not supported.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.