2020
DOI: 10.48550/arxiv.2005.11442
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Active Learning for Skewed Data Sets

Abbas Kazerouni,
Qi Zhao,
Jing Xie
et al.

Abstract: Consider a sequential active learning problem where, at each round, an agent selects a batch of unlabeled data points, queries their labels and updates a binary classifier. While there exists a rich body of work on active learning in this general form, in this paper, we focus on problems with two distinguishing characteristics: severe class imbalance (skew) and small amounts of initial training data. Both of these problems occur with surprising frequency in many web applications. For instance, detecting offens… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(10 citation statements)
references
References 29 publications
0
10
0
Order By: Relevance
“…In this direction, although uncertainty sampling [8,23,9] is an effective method for mining generic informative examples [8,22,23], they are known to be ineffective in mining minority-class examples [1,2,3,6]. This has been attributed to the fact that uncertainty sampling, being biased on previously seen examples, ignores regions that are underrepresented in the initially labeled dataset [14]. Several approaches [7,15,26,11] propose to account for the skewness by using prior information about class imbalance to boost query scores corresponding to tail classes.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…In this direction, although uncertainty sampling [8,23,9] is an effective method for mining generic informative examples [8,22,23], they are known to be ineffective in mining minority-class examples [1,2,3,6]. This has been attributed to the fact that uncertainty sampling, being biased on previously seen examples, ignores regions that are underrepresented in the initially labeled dataset [14]. Several approaches [7,15,26,11] propose to account for the skewness by using prior information about class imbalance to boost query scores corresponding to tail classes.…”
Section: Related Workmentioning
confidence: 99%
“…As a workaround, recent approaches [5,14] propose augmenting uncertainty sampling with an exploration/ geometry/ redundancy criteria in the input space. The key insight is to allow exploration to new uncertain areas in the input space.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations