2007
DOI: 10.1186/1471-2105-8-s2-s7
|View full text |Cite
|
Sign up to set email alerts
|

Model order selection for bio-molecular data clustering

Abstract: Background: Cluster analysis has been widely applied for investigating structure in bio-molecular data. A drawback of most clustering algorithms is that they cannot automatically detect the "natural" number of clusters underlying the data, and in many cases we have no enough "a priori" biological knowledge to evaluate both the number of clusters as well as their validity. Recently several methods based on the concept of stability have been proposed to estimate the "optimal" number of clusters, but despite thei… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
45
0

Year Published

2007
2007
2021
2021

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 38 publications
(45 citation statements)
references
References 33 publications
0
45
0
Order By: Relevance
“…Using the algorithm described in Section 2 and the standard average-linkage algorithm with Euclidean distance to perform the hierarchical clusterings, we iterated 50 random projections from the original 14-dimensional space to a lower 10-dimensional space, using Bernoulli random projections (Bertoni and Valentini, 2007).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Using the algorithm described in Section 2 and the standard average-linkage algorithm with Euclidean distance to perform the hierarchical clusterings, we iterated 50 random projections from the original 14-dimensional space to a lower 10-dimensional space, using Bernoulli random projections (Bertoni and Valentini, 2007).…”
Section: Resultsmentioning
confidence: 99%
“…Considering that clusters of genes may show a hierarchical multi-level organisation (Bertoni and Valentini, 2007), we could reduce the computational complexity by examining a linear number of clusters, computed by a hierarchical clustering algorithm.…”
Section: Introductionmentioning
confidence: 99%
“…Three of the papers are particularly focused in this task. Yoon et al [5] introduce a robust preprocessing method for treating missing values in gene expression data, and Bertoni and Valentini [6] decide the number of clusters based on stability against fluctuations caused by random projections. In the only paper on metabonomics, Vehtari et al [7] introduce a full-Bayesian way of modeling the mapping between NMR spectra and clinical variables.…”
Section: Summary Of the Supplementmentioning
confidence: 99%
“…Nevertheless, Gaussian mixtures assume a probabilistic (fuzzy) model for the data, and so these approaches can not be directly applied to the validation of a crisp partition. Finally, some recent techniques based on stability criteria measure the reproducibility of clustering solutions on a second sample [27,28,29]. They have been applied to cluster validation mainly for gene expression data sets.…”
Section: Introductionmentioning
confidence: 99%