Proceedings of the 2009 SIAM International Conference on Data Mining 2009
DOI: 10.1137/1.9781611972795.14
|View full text |Cite
|
Sign up to set email alerts
|

Prior-Free Rare Category Detection

Abstract: Rare category detection is an open challenge in machine learning. It plays the central role in applications such as detecting new financial fraud patterns, detecting new network malware, and scientific discovery. In such cases rare categories are hidden among huge volumes of normal data and observations. In this paper, we propose a new method for rare category detection named SEDER, which requires no prior information about the data set. It implicitly performs semiparametric density estimation using specially … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(4 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…The user provides labels that indicate whether a data example belongs to an undiscovered rare category. (He and Carbonell 2009) and (Liu et al 2014) instead resort to semi-parametric density estimation and wavelet transform respectively to rank all data examples. (Vatturi and Wong 2009) employs hierarchical mean shift clustering to detect rare categories of different scales.…”
Section: Rare Category Detectionmentioning
confidence: 99%
“…The user provides labels that indicate whether a data example belongs to an undiscovered rare category. (He and Carbonell 2009) and (Liu et al 2014) instead resort to semi-parametric density estimation and wavelet transform respectively to rank all data examples. (Vatturi and Wong 2009) employs hierarchical mean shift clustering to detect rare categories of different scales.…”
Section: Rare Category Detectionmentioning
confidence: 99%
“…This requirement renders those methods unsuitable for the scientific discovery problem, in which we do not know how many classes are present. SEDER (He and Carbonell 2009), which performs a semiparametric density estimation to discover classes, and CLOVER (Huang et al 2012), which uses LVD (local variation degree) to improve the computational cost and rate of class discovery, do not require knowledge about the number of classes, but they retain the requirement for a labeling oracle. In contrast, for scientific discovery the user cannot always ascribe a label when presented with a new item.…”
Section: Related Workmentioning
confidence: 99%
“…Figure 1 shows the empirical class discovery rate for the glass data set. Results for CLOVER and random sampling were obtained from Huang et al (2012); results for SEDER, NNDM, and Interleave were obtained from He and Carbonell (2009). The figure shows results for DEMUD (k = 2) and for a static baseline strategy that ranks all items by their reconstruction error using the full data set SVD (k = 2).…”
Section: Benchmark Class Discoverymentioning
confidence: 99%
See 1 more Smart Citation