Leveraging organizational resources to adapt models to new data modalities

Suri, Sahaana; Chanda, Raghuveer; Bulut, Neslihan; Narayana, Pradyumna; Zeng, Yemao; Bailis, Peter; Basu, Sugato; Narlikar, Girija; Ré, Christopher; Sethi, Abishek

doi:10.14778/3415478.3415559

Cited by 5 publications

(2 citation statements)

References 71 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We implement semantic rules via similarity based on contextual token embeddings (BERT [5] and ELMo [16]). In addition to the active learning approach previously implemented in Ruler, TagRuler proposes a novel active learning component focusing on false positives, thus contributing to one of the main challenges in data programming: surfacing difficult (or borderline) labeled examples [20]. In this active learning approach, unlabeled examples that have higher potential to identify false positives will have higher probability to be sampled as next instance to be labeled.…”

Section: Data Programmming By Demonstration (Dpbd)mentioning

confidence: 99%

“…TagRuler is a system designed to learn from expert knowledge, and expert manual annotation is expensive, so it is important to obtain informative data with as few annotations as possible. The active learning approach focuses on contributing to this main challenge in data programming: generating difficult (or borderline) examples [20]. TagRuler samples the text to be displayed in Figure 2 (A) after each annotation using an active learning technique that leverages the trained label model and a small labeled development set.…”

Section: Active Samplermentioning

confidence: 99%

See 1 more Smart Citation

TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration

Choi

Evensen

Demiralp

et al. 2021

Companion Proceedings of the Web Conference 2021

View full text Add to dashboard Cite

Despite rapid developments in the field of machine learning research, collecting high quality labels for supervised learning remains a bottleneck for many applications. This difficulty is exacerbated by the fact that state-of-the art models for NLP tasks are becoming deeper and more complex, often increasing the amount of training data required even for fine-tuning. Weak supervision methods, including data programming, address this problem and reduce the cost of label collection by using noisy label sources for supervision. However until recently, data programming was only accessible to users who knew how to program. In order to bridge this gap, the Data Programming by Demonstration framework was proposed to facilitate the automatic creation of labeling functions based on a few examples labeled by a domain expert. This framework has proven successful for generating high accuracy labeling models for document classification. In this work, we extend the DPBD framework to span-level annotation tasks, arguably one of the most time consuming NLP labeling tasks. We built a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming and encourages them to explore trade-offs between different labeling models and active learning strategies. We empirically demonstrated that an annotator could achieve a higher F1 score using the proposed tool compared to manual labeling for different span-level annotation tasks. CCS CONCEPTS• Information systems → Web applications.

show abstract

Section: Data Programmming By Demonstration (Dpbd)mentioning

confidence: 99%

Section: Active Samplermentioning

confidence: 99%