2018 IEEE International Conference on Big Data (Big Data) 2018
DOI: 10.1109/bigdata.2018.8622459
|View full text |Cite
|
Sign up to set email alerts
|

Hybridization of Active Learning and Data Programming for Labeling Large Industrial Datasets

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
23
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(24 citation statements)
references
References 38 publications
1
23
0
Order By: Relevance
“…Unsupervised learning techniques, such as subspace clustering, have been shown to find influential points from a cluster [51]. A hybrid method that connects active learning and data programming [48] has shown improvements in the reduction of noisy data in large scale workspaces [15]. Similar to our work, active learning approaches [23], [63], [78] have been effective while training biased and highly varied datasets.…”
Section: Related Worksupporting
confidence: 54%
“…Unsupervised learning techniques, such as subspace clustering, have been shown to find influential points from a cluster [51]. A hybrid method that connects active learning and data programming [48] has shown improvements in the reduction of noisy data in large scale workspaces [15]. Similar to our work, active learning approaches [23], [63], [78] have been effective while training biased and highly varied datasets.…”
Section: Related Worksupporting
confidence: 54%
“…Previous work has proposed a variety of methods for giving users (who are in our case the product moderators) control over classifiers by making use of a pipeline that allows them to provide feedback about training data labels and classification results. In WeSAL (Nashaat et al, 2018(Nashaat et al, , 2020 user feedback improves the labels that sets of rules assign to data points. In contrast, our focus is on feedback that allows moderators to improve the rules directly.…”
Section: Related Workmentioning
confidence: 99%
“…These ideas have been used to expand intent training data for conversational agents (Mallinar et al 2019) and to quickly label large industrial data sets (Nashaat et al 2018). We build on these works by using an iterative procedure to automatically construct weak models and show that the Snorkel generative model can select more relevant sentences for labeling than search-based methods alone.…”
Section: Data Programming Weak Supervision and Machine Teachingmentioning
confidence: 99%