Proceedings of the 15th ACM International Conference on Information and Knowledge Management - CIKM '06 2006
DOI: 10.1145/1183614.1183739
|View full text |Cite
|
Sign up to set email alerts
|

On subspace clustering with density consciousness

Abstract: In this paper, a problem, called "the density divergence problem" is explored. This problem is related to the phenomenon that the densities of the clusters vary in di erent subspace cardinalities. We take the densities into consideration in subspace clustering and explore an algorithm to adaptively determine di erent density thresholds to discover clusters in di erent subspace cardinalities.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2009
2009
2010
2010

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 5 publications
0
2
0
Order By: Relevance
“…Since the threshold for the optimal curve for a multidimensional space is fixed to be a constant value, we used √ nd in the denominator to nullify the effect of 'the density divergence problem'. The density divergence problem [34] indicates that it is difficult to set a global threshold when the dimensionality varies because the data are naturally far apart in high-dimensional spaces. With higher dimensions, the distance between data points tends to get higher, which will affect the objective function and the corresponding threshold values.…”
Section: The Objective Functionmentioning
confidence: 99%
See 1 more Smart Citation
“…Since the threshold for the optimal curve for a multidimensional space is fixed to be a constant value, we used √ nd in the denominator to nullify the effect of 'the density divergence problem'. The density divergence problem [34] indicates that it is difficult to set a global threshold when the dimensionality varies because the data are naturally far apart in high-dimensional spaces. With higher dimensions, the distance between data points tends to get higher, which will affect the objective function and the corresponding threshold values.…”
Section: The Objective Functionmentioning
confidence: 99%
“…Note that Data refers to the complete data matrix, Data(t) refers to the data points that belong to the principal curve in feature subset t. When two feature subsets of c − 1 dimensions are joined to generate a candidate feature subset of dimensionality c, the associated datapoints of the new feature subset will be the intersection of the data points associated with both the participating feature subsets (line 22). P Clist (c) will contain all the selected subspaces of cardinality c. When we cannot generate any new subspace of cardinality c from the c − 1 subspaces, the new P CList (c) remains empty, and we will output the P CList which is the union of all the P CList (i) where i varies from 2 to c − 1 (lines [20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36]. At each stage, whenever we select a feature subset as a desirable subspace, we remove the data points that are significantly distant from the curve, thus, the associated datapoints for that desirable subspace is reduced and relevant.…”
Section: Finding the Desirable Feature Sets Of Higher-dimensional Submentioning
confidence: 99%