Proceedings of the 2019 SIAM International Conference on Data Mining 2019
DOI: 10.1137/1.9781611975673.21
|View full text |Cite
|
Sign up to set email alerts
|

Intrinsic Dimensionality Estimation within Tight Localities

Abstract: Accurate estimation of Intrinsic Dimensionality (ID) is of crucial importance in many data mining and machine learning tasks, including dimensionality reduction, outlier detection, similarity search and subspace clustering. However, since their convergence generally requires sample sizes (that is, neighborhood sizes) on the order of hundreds of points, existing ID estimation methods may have only limited usefulness for applications in which the data consists of many natural groups of small size. In this paper,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 37 publications
(34 citation statements)
references
References 60 publications
0
32
0
2
Order By: Relevance
“…Amsaleg et al showed in [2] that MLE estimates the LID well. We remark that in very recent work, Amsaleg et al proposed in [3] a new MLE-based estimator that works with smaller k values than (1).…”
Section: Definition 1 ([11]mentioning
confidence: 90%
See 1 more Smart Citation
“…Amsaleg et al showed in [2] that MLE estimates the LID well. We remark that in very recent work, Amsaleg et al proposed in [3] a new MLE-based estimator that works with smaller k values than (1).…”
Section: Definition 1 ([11]mentioning
confidence: 90%
“…Interestingly, Annoy makes much fewer distance computations but is consistently outperformed by IVF. 3 Comparing the number of distance computations to running time performance, we see that an increase in the number of distance computations is not reflected in a proportional decrease in the number of queries per second. This means that the candidate set generation is in general more expensive for graph-based approaches, but the resulting candidate set is of much higher quality and fewer distance computations have to be carried out.…”
Section: Influence Of Lid On Performancementioning
confidence: 91%
“…Practical methods that have been developed for the estimation of the EVT index, including expansion-based estimators [36] and the well-known Hill estimator and its variants [37], can all be applied to LID (for a survey, see [38]). Recently, techniques have been developed that use expansion from neighboring points to stabilize LID estimation, allowing for smaller neighborhood samples to be used [39].…”
Section: B Intrinsic Dimensionalitymentioning
confidence: 99%
“…Graph-based methods exploit scaling properties of graphs, such as the length of the minimum spanning tree (Costa and Hero, 2004). Nearest neighbors methods rely on scaling properties of the distribution of local distances or angles, due for example to measure concentration (Levina and Bickel, 2004;Ceruti et al, 2014;Johnsson, 2016;Facco et al, 2017;Wissel, 2018;Amsaleg et al, 2019;Díaz et al, 2019;Gomtsyan et al, 2019). It has also been recently proposed to use the Fisher separability statistic (i.e., the probability of a data point to be separated from the rest of the data point cloud by a Fisher discriminant) for the estimation of ID Albergante et al, 2019).…”
Section: Defining and Measuring Intrinsic Dimensionmentioning
confidence: 99%