Automatic subspace clustering of high dimensional data for data mining applications

AgrawalRakesh,; GehrkeJohannes,; Gunopulos, Dimitrios; RaghavanPrabhakar,

doi:10.1145/276305.276314

Cited by 830 publications

(626 citation statements)

References 33 publications

Supporting

Mentioning

608

Contrasting

Unclassified

Order By: Relevance

“…The problem of co-clustering is also closely related to the problem of subspace clustering [7] or projected clustering [5] in quantitative data in the database literature. In this problem, the data is clustered by simultaneously associating it with a set of points and subspaces in multi-dimensional space.…”

Section: Co-clustering Words and Documentsmentioning

confidence: 99%

A Survey of Text Clustering Algorithms

2012

View full text Add to dashboard Cite

Clustering is a widely studied data mining problem in the text domains. The problem finds numerous applications in customer segmentation, classification, collaborative filtering, visualization, document organization, and indexing. In this chapter, we will provide a detailed survey of the problem of text clustering. We will study the key challenges of the clustering problem, as it applies to the text domain. We will discuss the key methods used for text clustering, and their relative advantages. We will also discuss a number of recent advances in the area in the context of social network and linked data.

show abstract

Section: Co-clustering Words and Documentsmentioning

confidence: 99%

A Survey of Text Clustering Algorithms

2012

View full text Add to dashboard Cite

show abstract

“…In [3], they define clusters in euclidean space by DNF formulas and address performance issues for data mining applications. In [87], the drawbacks of random sampling in clustering algorithms (e.g., small clusters might be missed) are avoided by density biased sampling.…”

Section: Bibliographical Notesmentioning

confidence: 99%

On Approximation Algorithms for Data Mining Applications

Afrati

2006

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Contiguous dense cells are connected to form clusters. Examples of grid-based clustering methods include STING [15] and CLIQUE [16].…”

Section: Categorization Of Clustering Algorithmsmentioning

confidence: 99%

DBRS: A Density-Based Spatial Clustering Method with Random Sampling

Wang¹,

Hamilton²

2003

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

When analyzing spatial databases or other datasets with spatial attributes, one frequently wants to cluster the data according to spatial attributes. In this paper, we describe a novel density-based spatial clustering method called DBRS. The algorithm can identify clusters of widely varying shapes, clusters of varying densities, clusters which depend on non-spatial attributes, and approximate clusters in very large databases. DBRS achieves these results by repeatedly picking an unclassified point at random and examining its neighborhood. If the neighborhood is sparsely populated or the purity of the points in the neighborhood is too low, the point is classified as noise. Otherwise, if any point in the neighborhood is part of a known cluster, this neighborhood is joined to that cluster. If neither of these two possibilities applies, a new cluster is begun with this neighborhood. DBRS scales well on dense clusters. A heuristic is proposed for approximate clustering in very large databases. With this heuristic, the run time can be significantly reduced by assuming that a probabilistically controlled number of points are noise. A theoretical comparison of DBRS and DBSCAN, a well-known density-based algorithm, is given. Finally, DBRS is empirically compared with DBSCAN, CLARANS, and k-means on synthetic and real data sets.

show abstract

Automatic subspace clustering of high dimensional data for data mining applications

Cited by 830 publications

References 33 publications

A Survey of Text Clustering Algorithms

A Survey of Text Clustering Algorithms

On Approximation Algorithms for Data Mining Applications

DBRS: A Density-Based Spatial Clustering Method with Random Sampling

Contact Info

Product

Resources

About