2022
DOI: 10.48550/arxiv.2203.01382
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming

Abstract: Weak Supervision (WS) techniques allow users to efficiently create large training datasets by programmatically labeling data with heuristic sources of supervision. While the success of WS relies heavily on the provided labeling heuristics, the process of how these heuristics are created in practice has remained under-explored. In this work, we formalize the development process of labeling heuristics as an interactive procedure, built around the existing workflow where users draw ideas from a selected set of de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…Ein-Dor et al ( 2020) offer an empirical study of active learning with PLMs. Very recently, there are also several works attempted to query labeling functions for weakly-supervised learning (Boecking et al, 2020;Hsieh et al, 2022;Zhang et al, 2022b). In our study, we leverage the power of unlabeled instances via self-training to further promote the performance of AL.…”
Section: Related Workmentioning
confidence: 99%
“…Ein-Dor et al ( 2020) offer an empirical study of active learning with PLMs. Very recently, there are also several works attempted to query labeling functions for weakly-supervised learning (Boecking et al, 2020;Hsieh et al, 2022;Zhang et al, 2022b). In our study, we leverage the power of unlabeled instances via self-training to further promote the performance of AL.…”
Section: Related Workmentioning
confidence: 99%
“…First, by understanding what are the deciding factors that lead to a model's specific prediction, users are able to verify whether such influence is desirable in terms of critical aspects such as model safety and fairness [11,35,16]. Secondly, since developing LFs is still inevitably a partly manual process in real-world applications and may involve human expert in an iterative process [5,20,48], it is beneficial to offer users with feedback of how individual component (like each LF) influences the end model performance on the downstream task. Finally, the rendered understanding of (in)efficacy of the PWS pipeline can be naturally exploited to automatically improve the end model performance.…”
Section: Introductionmentioning
confidence: 99%