2010
DOI: 10.1007/978-3-642-13818-8_34
|View full text |Cite
|
Sign up to set email alerts
|

Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?

Abstract: Abstract. The performance of similarity measures for search, indexing, and data mining applications tends to degrade rapidly as the dimensionality of the data increases. The effects of the so-called 'curse of dimensionality' have been studied by researchers for data sets generated according to a single data distribution. In this paper, we study the effects of this phenomenon on different similarity measures for multiplydistributed data. In particular, we assess the performance of sharedneighbor similarity meas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
165
0
3

Year Published

2011
2011
2021
2021

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 214 publications
(169 citation statements)
references
References 36 publications
(53 reference statements)
1
165
0
3
Order By: Relevance
“…Regardless of the symbol set employed, it is clear that the approach described can lead to sparse elements embedded in high dimensional vector spaces. While data sets of this kind can be potentially problematic Beyer et al (1999); Hinneburg et al (2000); Houle et al (2010); Steinbach et al (2003), subspace dimension reduction techniques are derivable from LSI approaches such as the SVD. The IR techniques introduced above are readily applicable in any setting where bioinformatics data (sequence, structural, symbolic, etc) can be encoded.…”
Section: Discussionmentioning
confidence: 99%
“…Regardless of the symbol set employed, it is clear that the approach described can lead to sparse elements embedded in high dimensional vector spaces. While data sets of this kind can be potentially problematic Beyer et al (1999); Hinneburg et al (2000); Houle et al (2010); Steinbach et al (2003), subspace dimension reduction techniques are derivable from LSI approaches such as the SVD. The IR techniques introduced above are readily applicable in any setting where bioinformatics data (sequence, structural, symbolic, etc) can be encoded.…”
Section: Discussionmentioning
confidence: 99%
“…This principle of a common set of N N in different dimensions is similar to the concept of the shared nearest neighbor distance [6] or consensus methods. The intuition is that the member dimensions of a subspace agree (to a certain minimum threshold) in their N N rankings, when considered individually.…”
Section: Definition Of Subspace Nearest Neighbor Searchmentioning
confidence: 99%
“…According to [14], the shared nearest neighborhood (SNN) method can be used due to its robustness in high dimension dataset. However, SNN is not efficient because of its complexity.…”
Section: The Kddbscan Algorithmmentioning
confidence: 99%