In recent work, robust mixture modelling approaches using skewed distributions have been explored to accommodate asymmetric data. We introduce parsimony by developing skew-t and skew-normal analogues of the popular GPCM family that employ an eigenvalue decomposition of a positive-semidefinite matrix. The methods developed in this paper are compared to existing models in both an unsupervised and semi-supervised classification framework. Parameter estimation is carried out using the expectationmaximization algorithm and models are selected using the Bayesian information criterion. The efficacy of these extensions is illustrated on simulated and benchmark clustering data sets.
Traditionally, there are three species of classification: unsupervised, supervised, and semi-supervised. Supervised and semi-supervised classification differ by whether or not weight is given to unlabelled observations in the classification procedure. In unsupervised classification, or clustering, all observations are unlabeled and hence full weight is given to unlabelled observations. When some observations are unlabelled, it can be very difficult to \textit{a~priori} choose the optimal level of supervision, and the consequences of a sub-optimal choice can be non-trivial. A flexible fractionally-supervised approach to classification is introduced, where any level of supervision --- ranging from unsupervised to supervised --- can be attained. Our approach uses a weighted likelihood, wherein weights control the relative role that labelled and unlabelled data have in building a classifier. A comparison between our approach and the traditional species is presented using simulated and real data. Gaussian mixture models are used as a vehicle to illustrate our fractionally-supervised classification approach; however, it is broadly applicable and variations on the postulated model can be easily made
BackgroundIn the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances and high bootstrap support (or posterior probabilities). The computational burden involved in this phylogenetic threshold approach is a major drawback, especially when a large number of sequences are being considered. In addition, this method requires a skilled user to specify the appropriate threshold values which may vary widely depending on the application.ResultsThis paper presents the Gap Procedure, a distance-based clustering algorithm for the classification of DNA sequences sampled from individuals infected with the human immunodeficiency virus type 1 (HIV-1). Our heuristic algorithm bypasses the need for phylogenetic reconstruction, thereby supporting the quick analysis of large genetic data sets. Moreover, this fully automated procedure relies on data-driven gaps in sorted pairwise distances to infer clusters, thus no user-specified threshold values are required. The clustering results obtained by the Gap Procedure on both real and simulated data, closely agree with those found using the threshold approach, while only requiring a fraction of the time to complete the analysis.ConclusionsApart from the dramatic gains in computational time, the Gap Procedure is highly effective in finding distinct groups of genetically similar sequences and obviates the need for subjective user-specified values. The clusters of genetically similar sequences returned by this procedure can be used to detect patterns in HIV-1 transmission and thereby aid in the prevention, treatment and containment of the disease.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0791-x) contains supplementary material, which is available to authorized users.
Tumour heterogeneity plays a large role in the response of tumour tissues to radiation therapy. Inherent biological, physical, and even dose deposition heterogeneity all play a role in the resultant observed response. We here implement the use of Haralick textural analysis to quantify the observed glycogen production response, as observed via Raman spectroscopic mapping, of tumours irradiated within a murine model. While an array of over 20 Haralick features have been proposed, we here concentrate on five of the most prominent features: homogeneity, local homogeneity, contrast, entropy, and correlation. We show that these Haralick features can be used to quantify the inherent heterogeneity of the Raman spectroscopic maps of tumour response to radiation. Furthermore, our results indicate that Haralick-calculated textural features show a statistically significant dose dependent variation in response heterogeneity, specifically, in glycogen production in tumours irradiated with clinically relevant doses of ionizing radiation. These results indicate that Haralick textural analysis provides a quantitative methodology for understanding the response of murine tumours to radiation therapy. Future work in this area can, for example, utilize the Haralick textural features for understanding the heterogeneity of radiation response as measured by biopsied patient tumour samples, which remains the standard of patient tumour investigation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.