2016
DOI: 10.1007/s10994-016-5586-4
|View full text |Cite
|
Sign up to set email alerts
|

Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors

Abstract: Conventional wisdom in machine learning says that all algorithms are expected to follow the trajectory of a learning curve which is often colloquially referred to as 'more data the better'. We call this 'the gravity of learning curve', and it is assumed that no learning algorithms are 'gravity-defiant'. Contrary to the conventional wisdom, this paper provides the theoretical analysis and the empirical evidence that nearest neighbour anomaly detectors are gravity-defiant algorithms.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 34 publications
(25 citation statements)
references
References 21 publications
0
25
0
Order By: Relevance
“…When the number of clusters increases further, the data distribution becomes ill represented by the subsamples, resulting in a decrease of AUC, ie, iNNE (ψ=256): AUC degrades when the number of clusters >200, and iNNE (ψ=1024): AUC degrades when the number of clusters >700). This phenomenon is further explained in the work of Ting et al using computational geometry.…”
Section: Conceptual Comparisons With Iforest Lof and Spmentioning
confidence: 77%
See 2 more Smart Citations
“…When the number of clusters increases further, the data distribution becomes ill represented by the subsamples, resulting in a decrease of AUC, ie, iNNE (ψ=256): AUC degrades when the number of clusters >200, and iNNE (ψ=1024): AUC degrades when the number of clusters >700). This phenomenon is further explained in the work of Ting et al using computational geometry.…”
Section: Conceptual Comparisons With Iforest Lof and Spmentioning
confidence: 77%
“…First, mass-based dissimilarity measures 36,37 have been shown to outperform distance measures using the same NN algorithms in classification, clustering, anomaly detection, and information retrieval tasks. 13,27,38 Incorporating these into iNNE will enhance its effectiveness and guide in setting the appropriate sample size for different data sets, independent of the given data set size. 37 Second, theories have been developed to explain the reason why NN anomaly detectors can perform well with small samples.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…A detailed analysis of the advantages and drawbacks of these measures for unsupervised outlier detection can be found in [6]. Following the literature [6,16,30,32,37], the popular measure AUC is used. AUC inherently considers the class-imbalance nature of outlier detection, making it comparable across data sets with different outlier proportions [6].…”
Section: Performance Evaluation Methodsmentioning
confidence: 99%
“…The time complexity may be reduced to be nearly linear by using indexing [4] or distributed computing techniques [8]. Recent studies [26,30,32] show that random distance-based methods or distance-based ensemble methods can achieve not only a similar time complexity reduction but also low false positive errors, resulting in scalable state-of-theart distance-based detectors. However, these techniques still do not address the curse of dimensionality issue.…”
Section: Related Work 21 Distance-based Outlier Detectionmentioning
confidence: 99%