2005
DOI: 10.1198/016214504000001565
|View full text |Cite
|
Sign up to set email alerts
|

Bayesian Variable Selection in Clustering High-Dimensional Data

Abstract: Over the last decade, technological advances have generated an explosion of data with substantially smaller sample size relative to the number of covariates ( p n). A common goal in the analysis of such data involves uncovering the group structure of the observations and identifying the discriminating variables. In this article we propose a methodology for addressing these problems simultaneously. Given a set of variables, we formulate the clustering problem in terms of a multivariate normal mixture model with… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

2
220
0

Year Published

2006
2006
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 215 publications
(222 citation statements)
references
References 24 publications
2
220
0
Order By: Relevance
“…Data-clustering algorithms have been developed for more than half a century (1). Significant advances in the last two decades include spectral clustering (2-4), generalizations of classic center-based methods (5, 6), mixture models (7, 8), mean shift (9), affinity propagation (10), subspace clustering (11-13), nonparametric methods (14, 15), and feature selection (16)(17)(18)(19)(20).Despite these developments, no single algorithm has emerged to displace the k -means scheme and its variants (21). This is despite the known drawbacks of such center-based methods, including sensitivity to initialization, limited effectiveness in high-dimensional spaces, and the requirement that the number of clusters be set in advance.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…Data-clustering algorithms have been developed for more than half a century (1). Significant advances in the last two decades include spectral clustering (2-4), generalizations of classic center-based methods (5, 6), mixture models (7, 8), mean shift (9), affinity propagation (10), subspace clustering (11-13), nonparametric methods (14, 15), and feature selection (16)(17)(18)(19)(20).Despite these developments, no single algorithm has emerged to displace the k -means scheme and its variants (21). This is despite the known drawbacks of such center-based methods, including sensitivity to initialization, limited effectiveness in high-dimensional spaces, and the requirement that the number of clusters be set in advance.…”
mentioning
confidence: 99%
“…Data-clustering algorithms have been developed for more than half a century (1). Significant advances in the last two decades include spectral clustering (2-4), generalizations of classic center-based methods (5, 6), mixture models (7, 8), mean shift (9), affinity propagation (10), subspace clustering (11-13), nonparametric methods (14, 15), and feature selection (16)(17)(18)(19)(20).…”
mentioning
confidence: 99%
“…Following ref. 36, we assign larger probabilities to the merges of similar branches, such that where S is the total number of branches, and S 1 is the number of nonempty branches. Hitherto, we completely specified the branch splitting move of the RJMCMC (see SI Appendix, Text 3 for other moves).…”
Section: Resultsmentioning
confidence: 99%
“…Following ref. 36, we denote the transition probability of this move as qðθ new jθ old Þ, and assign…”
Section: Resultsmentioning
confidence: 99%
“…This is attractive, for example, when the number of genes on a microarray is the relevant sample size, thus allowing flexible semi-parametric representations. Such approaches are discussed, among others, in Broet P (2002), Dahl (2003), or Tadesse et al (2005). The latter exploit the clustering implicitely defined by the mixture model.…”
Section: Introductionmentioning
confidence: 99%