Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.601
|View full text |Cite
|
Sign up to set email alerts
|

Learning with Different Amounts of Annotation: From Zero to Many Labels

Abstract: Training NLP systems typically assumes access to annotated data that has a single human label per example. Given imperfect labeling from annotators and inherent ambiguity of language, we hypothesize that single label is not sufficient to learn the spectrum of language interpretation. We explore new annotation distribution schemes, assigning multiple labels per example for a small subset of training examples. Introducing such multi label examples at the cost of annotating fewer examples brings clear gains on na… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 41 publications
0
6
0
Order By: Relevance
“…Another study by (Bai et al, 2021) framed domain adaptation with a constrained budget as a consumer choice problem and evaluated the utility of different combinations of pretraining and data annotation under varying budget constraints. Another study by (Zhang et al, 2021) explored new annotation distribution schemes, assigning multiple labels per example for a small subset of training examples, and proposed a learning algorithm that efficiently combines signals from uneven training data. Finally, a study by (Chen et al, 2022) proposed an approach that reserves a fraction of annotations to explicitly clean up highly probable error samples to optimize the annotation process.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Another study by (Bai et al, 2021) framed domain adaptation with a constrained budget as a consumer choice problem and evaluated the utility of different combinations of pretraining and data annotation under varying budget constraints. Another study by (Zhang et al, 2021) explored new annotation distribution schemes, assigning multiple labels per example for a small subset of training examples, and proposed a learning algorithm that efficiently combines signals from uneven training data. Finally, a study by (Chen et al, 2022) proposed an approach that reserves a fraction of annotations to explicitly clean up highly probable error samples to optimize the annotation process.…”
Section: Related Workmentioning
confidence: 99%
“…This not only helps save annotation time and budget but also ensures efficient utilization of available resources. While some research (Wan et al, 2023;Zhang et al, 2021) has provided insights and suggestions on finding the optimal number of annotators, a definitive solution to this problem has yet to be achieved.…”
Section: Introductionmentioning
confidence: 99%
“…Future work could expand the training data of our NLI model to account for the subjectivity of NLI judgments (Pavlick and Kwiatkowski, 2019;Nie et al, 2020), particularly by modifying our data collection procedure (Zhang et al, 2021).…”
Section: Context Assumptionsmentioning
confidence: 99%
“…Bai et al (2021) show that similar trade-offs exist when performing domain adaptation on a constrained budget. Zhang et al (2021) observe that difficult examples benefit from additional annotations, so optimal spending actually varies the amount of labels given to each example. Our approach actively targets examples for relabeling based on its likelihood of noise, whereas they randomly select examples for multi-labeling without considering its characteristics.…”
Section: Denoising Techniquesmentioning
confidence: 99%