2016
DOI: 10.1613/jair.5228
|View full text |Cite
|
Sign up to set email alerts
|

ZERO++: Harnessing the Power of Zero Appearances to Detect Anomalies in Large-Scale Data Sets

Abstract: This paper introduces a new unsupervised anomaly detector called ZERO++ which employs the number of zero appearances in subspaces to detect anomalies in categorical data. It is unique in that it works in regions of subspaces that are not occupied by data; whereas existing methods work in regions occupied by data. ZERO++ examines only a small number of low dimensional subspaces to successfully identify anomalies. Unlike existing frequencybased algorithms, ZERO++ does not involve subspace pattern searching. We s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(12 citation statements)
references
References 34 publications
0
12
0
Order By: Relevance
“…Embedding-based representation, which is the most widely used in categorical data representation, generates a numerical vector to represent each categorical object. A popular embedding method called 1-hot encoding translates each feature value to a zero-one indicator vector [ 6 ]. It first counts the values of one feature as .…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Embedding-based representation, which is the most widely used in categorical data representation, generates a numerical vector to represent each categorical object. A popular embedding method called 1-hot encoding translates each feature value to a zero-one indicator vector [ 6 ]. It first counts the values of one feature as .…”
Section: Related Workmentioning
confidence: 99%
“…Generally, existing methods fall into two categories: the embedding-based method and the similarity-based method. Typical embedding methods, e.g., 1-hot encoding and Inverse Document Frequency (IDF) encoding [ 6 , 7 ], transform categorical data to numerical data by some encoding schemes directly. But these methods treat features independently and ignore the couplings between feature values.…”
Section: Introductionmentioning
confidence: 99%
“…COSH uses the multi-granularity value clusters to compute the most outlying aspect of values, which enables it to obtain reliable outlier scores in data sets with many irrelevant/noisy features. Substantial experiments show that (1) CDE significantly outperforms three popular embedding methods: one-hot encoding (noted as 0-1), one-hot encoding with PCA (0-1P), and inverse document frequency embedding (IDF), with maximum F-score improvement of 19%, and gains maximally 8% F-score improvement over three state-of-the-art similarity measures for clustering: COS [1], DILCA [8] and ALGO [7] on 10 real-world data sets with different value coupling complexities; (2) COSH significantly outperforms (maximally 67% AUC improvement) five state-of-the-art outlier detection methods: CBRW [14], and ZERO [15], iForest [16], ABOD [17] and LOF [18] on 10 high-dimensional data sets; (3) CDE and COSH obtain good scalability: it is linear to data size and quadratic to the number of features; and (4) CDE and COSH perform stably and are insensitive to its parameters.…”
Section: Contributionsmentioning
confidence: 99%
“…At the beginning of the 1990, a number of IDSs were developed, mostly relying on a combination of statistical and expert systems approaches [14]. According to [15] the processing is more statistically sophisticated and simple -real-time alerts became possible. The use of competitive neural networks in researches related to intrusion detection in computer networks is present in a number of studies [16].…”
Section: Related Workmentioning
confidence: 99%
“…where the dominant role is on those other techniques [32], [25], but very often they use fuzzy logic or SOM for additional fine tuning [15]. Common for all previously mentioned researches is the use of techniques of the artificial intelligence, where the systems based on the application of neural networks with competitive learning [33] and fuzzy logic play an important role.…”
Section: Related Workmentioning
confidence: 99%