2005
DOI: 10.1109/tkde.2005.31
|View full text |Cite
|
Sign up to set email alerts
|

Outlier mining in large high-dimensional data sets

Abstract: In this paper a new definition of distance-based outlier and an algorithm, called HilOut, designed to efficiently detect the top n outliers of a large and high-dimensional data set are proposed. Given an integer k, the weight of a point is defined as the sum of the distances separating it from its k nearest-neighbors. Outlier are those points scoring the largest values of weight. The algorithm HilOut makes use of the notion of space-filling curve to linearize the data set, and it consists of two phases. The fi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
189
0
7

Year Published

2009
2009
2015
2015

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 305 publications
(196 citation statements)
references
References 26 publications
0
189
0
7
Order By: Relevance
“…More specifically, for two arbitrary data points p 1 and p 2 in DS, F out (p 1 ) and F out (p 2 ) can be compared with each other, and if F out (p 1 ) > F out (p 2 ), p 1 has a larger possibility than p 2 to be an outlier. The definitions proposed by Angiulli et al [6], Breunig et al [3], and Ramaswamy et al [7] straightforwardly adhere to this category. On the other hand, the definition of Ng and Knorr [4] can be converted to this category by taking the inverse of the number of neighbors within distance r of each data point.…”
Section: Introductionmentioning
confidence: 99%
See 4 more Smart Citations
“…More specifically, for two arbitrary data points p 1 and p 2 in DS, F out (p 1 ) and F out (p 2 ) can be compared with each other, and if F out (p 1 ) > F out (p 2 ), p 1 has a larger possibility than p 2 to be an outlier. The definitions proposed by Angiulli et al [6], Breunig et al [3], and Ramaswamy et al [7] straightforwardly adhere to this category. On the other hand, the definition of Ng and Knorr [4] can be converted to this category by taking the inverse of the number of neighbors within distance r of each data point.…”
Section: Introductionmentioning
confidence: 99%
“…Researchers have developed several supervised and unsupervised techniques to mine outliers in static databases and also recently in data streams [9]. Unsupervised outlier detection can be further classified as distance-based [5,6,4,7], density-based [3,8,9] and deviation-based [10]. In this paper, we focus on distance-based outliers which have been popularly defined as: (a) data points from which there are fewer than p points that are within distance r [4], (b) top n data points whose distance to their corresponding k th nearest neighbor are largest [7], and (c) top n data points whose total distance to their corresponding k nearest neighbors are largest [6].…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations