Proceedings of the 26th Annual International Conference on Machine Learning 2009
DOI: 10.1145/1553374.1553405
|View full text |Cite
|
Sign up to set email alerts
|

A scalable framework for discovering coherent co-clusters in noisy data

Abstract: Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive expressions within a subset of the conditions/features. The existence of a large number of non-informative data points and features makes it challenging to hunt for coherent and meaningful clusters from such datasets. Additionally, since clusters could exist in different subspaces of the feature space, a co-clustering algorithm that simu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0
5

Year Published

2009
2009
2014
2014

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 29 publications
(24 citation statements)
references
References 25 publications
0
19
0
5
Order By: Relevance
“…In the specific context of co-clustering of general data, there have been some prior works such as [37], [6]. However, these models either do not handle relevant object selection, do not exploit pairwise object similarities, and need the number of clusters to be specified a priori.…”
Section: Related Workmentioning
confidence: 99%
“…In the specific context of co-clustering of general data, there have been some prior works such as [37], [6]. However, these models either do not handle relevant object selection, do not exploit pairwise object similarities, and need the number of clusters to be specified a priori.…”
Section: Related Workmentioning
confidence: 99%
“…We compared our message passing (EBMP) and greedy (EBG) algorithms with two other coherent biclustering algorithms, Chen&Church [6] (using the BicAT software from http://www.tik.ethz.ch/sop/bicat/) and ROCC [8], on four synthetic datasets. The first two of the four datasets each contain fifteen 10×10 data matrices, and the third and fourth datasets each contain fifteen 50×50 data matrices.…”
Section: Experiments With Synthetic Datamentioning
confidence: 99%
“…Sometimes these names refer to different variants of the biclustering problem: for example, the term "co-clustering" was used in [9] to describe the problem of finding biclusters that form a checkerboard pattern in the data matrix. Applications of biclustering include: finding groups of genes that display similar expression patterns under subsets of time points or conditions [6,13,25,24,21,7,8]; finding groups of users who share interest in certain subsets of movies [25,24]; finding clusters of documents that share subsets of words [9]; and finding correlations between groups of words and phrases in a natural language corpus to induce grammar rules [1,23].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations