Proceedings of the 21st ACM International Conference on Information and Knowledge Management 2012
DOI: 10.1145/2396761.2396816
|View full text |Cite
|
Sign up to set email alerts
|

Fast and reliable anomaly detection in categorical data

Abstract: Spotting anomalies in large multi-dimensional databases is a crucial task with many applications in finance, health care, security, etc. We introduce COMPREX, a new approach for identifying anomalies using pattern-based compression. Informally, our method finds a collection of dictionaries that describe the norm of a database succinctly, and subsequently flags those points dissimilar to the norm-with high compression cost-as anomalies.Our approach exhibits four key features: 1) it is parameterfree; it builds d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
93
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 90 publications
(93 citation statements)
references
References 27 publications
0
93
0
Order By: Relevance
“…Most existing categorical data oriented methods are based on a general assumption that anomalies lie in regions of low frequency (Akoglu et al, 2012;Ghoting, Otey, & Parthasarathy, 2004;He et al, 2005;Koufakou, Ortiz, Georgiopoulos, Anagnostopoulos, & Reynolds, 2007;Koufakou & Georgiopoulos, 2010;Smets & Vreeken, 2011;He, Deng, Xu, & Huang, 2006). Typical examples are frequent patterns based methods FPOF (He et al, 2005) and infrequent patterns based methods LOADED (Ghoting et al, 2004).…”
Section: Methods For Categorical Datamentioning
confidence: 99%
See 4 more Smart Citations
“…Most existing categorical data oriented methods are based on a general assumption that anomalies lie in regions of low frequency (Akoglu et al, 2012;Ghoting, Otey, & Parthasarathy, 2004;He et al, 2005;Koufakou, Ortiz, Georgiopoulos, Anagnostopoulos, & Reynolds, 2007;Koufakou & Georgiopoulos, 2010;Smets & Vreeken, 2011;He, Deng, Xu, & Huang, 2006). Typical examples are frequent patterns based methods FPOF (He et al, 2005) and infrequent patterns based methods LOADED (Ghoting et al, 2004).…”
Section: Methods For Categorical Datamentioning
confidence: 99%
“…FPOF and LOADED build a single model on the entire training set, and identify anomalies based on frequent patterns and infrequent patterns, respectively. KRIMP (Smets & Vreeken, 2011) and COMPREX (Akoglu et al, 2012) also build a single model on the entire training set using pattern-based compression techniques. KRIMP generates the patterns based on frequent itemsets, while COMPREX employs the Minimum Description Length (Barron, Rissanen, & Yu, 1998) principle to automatically generate patterns from attribute groups (subspaces) with high information gain and avoid the costly frequent itemset search.…”
Section: Methods For Categorical Datamentioning
confidence: 99%
See 3 more Smart Citations