Proceedings of the 2017 SIAM International Conference on Data Mining 2017
DOI: 10.1137/1.9781611974973.15
|View full text |Cite
|
Sign up to set email alerts
|

Efficiently Discovering Unexpected Pattern-Co-Occurrences

Abstract: Our world is filled with both beautiful and brainy people, but how often does a Nobel Prize winner also wins a beauty pageant? Let us assume that someone who is both very beautiful and very smart is more rare than what we would expect from the combination of the number of beautiful and brainy people. Of course there will still always be some individuals that defy this stereotype; these beautiful brainy people are exactly the class of anomaly we focus on in this paper. They do not posses intrinsically rare qual… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 21 publications
0
7
0
Order By: Relevance
“…Another recent contribution by Siddiqui et al [2018] shows that human-in-the-loop feedback can be used in a semi-supervised way to improve detection results over baseline unsupervised detectors over numerical data. On the other hand, there are a number of generic approaches to anomaly detection for discrete (categorical) data [He et al, 2005, Narita and Kitagawa, 2008, Koufakou et al, 2007, Smets and Vreeken, 2011, Bertens et al, 2017, Akoglu et al, 2012, Bertens et al, 2017. Most of these approaches first mine the data for frequent itemsets or association rules, and all then perform anomaly scoring in a second pass over the data.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Another recent contribution by Siddiqui et al [2018] shows that human-in-the-loop feedback can be used in a semi-supervised way to improve detection results over baseline unsupervised detectors over numerical data. On the other hand, there are a number of generic approaches to anomaly detection for discrete (categorical) data [He et al, 2005, Narita and Kitagawa, 2008, Koufakou et al, 2007, Smets and Vreeken, 2011, Bertens et al, 2017, Akoglu et al, 2012, Bertens et al, 2017. Most of these approaches first mine the data for frequent itemsets or association rules, and all then perform anomaly scoring in a second pass over the data.…”
Section: Related Workmentioning
confidence: 99%
“…Since OC 3 was often the most effective batch algorithm, we think it would be interesting to develop a streaming approach based on MDL, either by adapting the underlying Krimp compression algorithm to support streaming anomaly detection, or by building on streaming compression techniques such as adaptive arithmetic coding [Witten et al, 1987]. The UPC algorithm of Bertens et al [2017] is also based on pattern mining and MDL, and is inherently a two-pass approach, but seeks a different kind of anomalies than AVF, OC3, and CompreX, consisting of unexpectedly rare combinations of frequent itemsets.…”
Section: Related Workmentioning
confidence: 99%
“…Recall that condition (2) requires that VIO(ϕ, D dirty ) ∩ σ tid M (D dirty ) is not empty. Given equivalence classes over D dirty , this condition can be checked along the same lines as done for condition (3). Indeed, we compute conf(ϕ, D dirty ) and along the way we check for violations involving tids in σ M (D dirty ), just as before.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…When applying both modifications, however, tuples t3, t8 have CC = 01, but a different PN, violating the CFD ϕ. ♦ To define an efficient, yet useful scoring function for variable CFDs, we will treat a variable CFD ϕ = (X → A, (tp, )) as a union of a finite number of constant CFDs, say Σ = {ϕ1, • • • , ϕm} 3 . Moreover, when we allow unions of constant CFDs to serve as the constraint language for global explanations, they inherit the nice properties of single constant CFDs, with some restrictions.…”
Section: Rationale Behind Uc-scorementioning
confidence: 99%
See 1 more Smart Citation