2011
DOI: 10.3389/fgene.2011.00088
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Clustering Methods for Investigation of Genome-Wide Methylation Array Data

Abstract: The use of genome-wide methylation arrays has proved very informative to investigate both clinical and biological questions in human epigenomics. The use of clustering methods either for exploration of these data or to compare to an a priori grouping, e.g., normal versus disease allows assessment of groupings of data without user bias. However no consensus on the methods to use for clustering of methylation array approaches has been reached. To determine the most appropriate clustering method for analysis of i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
28
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 36 publications
(29 citation statements)
references
References 27 publications
1
28
0
Order By: Relevance
“…K -means clustering of the RNA data, using the Silhouette measure (Clifford et al, 2011) to identify the best k (Fig. S1B), revealed two distinct cell populations that were roughly equal in size (Fig.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…K -means clustering of the RNA data, using the Silhouette measure (Clifford et al, 2011) to identify the best k (Fig. S1B), revealed two distinct cell populations that were roughly equal in size (Fig.…”
Section: Resultsmentioning
confidence: 99%
“…K -means clustering was done in MATLAB using the squared Euclidean distance of normalized data (z-scores). To determine the optimal k , we applied every value from 2 to 20, assessed the average Silhouette value (Clifford et al, 2011) for each clustering result (Figure S1B), and selected k =2, which gave the largest mean Silhouette value. Differentially expressed genes were identified using a two-sided Wilcoxon-Mann-Whitney rank sum test implemented in the “coin” package in R. Differences between populations were determined by subtracting median Ct values (equivalent to log 2 expression levels).…”
Section: Methodsmentioning
confidence: 99%
“…Multiscale bootstrap values for each node determined by resampling of 1,000 replicates are shown. Canberra distance and Ward Linkage was determined to provide the greatest cluster separation based on mean silhouette width using the ClusterRank software (Clifford et al., 2011).…”
Section: Resultsmentioning
confidence: 99%
“…This enables the exploration of complex data without the need for a priori definition of groups that may be biased by experimenter expectations. The most appropriate algorithm for clustering was determined empirically using the ClusterRank software (Clifford et al., 2011). To identify statistically significant differentially methylated sites, normalized data were log 2 transformed and a t ‐test with a false discovery rate (FDR) multiple hypothesis correction was conducted to compare the mean scores between identified clusters.…”
Section: Methodsmentioning
confidence: 99%
“…Another study used both techniques to cluster transcription factors [101]. Clifford et al [102] compared hierarchical clustering to other clustering techniques (k-means, k-medoids, and fuzzy clustering) to determine the most appropriate one for analyzing Illumina methylation data. Since no significant difference was found between the methods, a combination was proposed; the final output will be given by the method that achieves the best results in each case.…”
Section: Bioinformatics Of Personalized Epigeneticsmentioning
confidence: 99%