Distributed NLI: Learning to Predict Human Opinion Distributions for Language Reasoning

Zhang, Xiang; Nie, Yong; Bansal, Mohit

doi:10.48550/arxiv.2104.08676

Cited by 1 publication

(2 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, recent trends in NLP have began questioning aggregation, arguing that subjective labels should not be aggregated if multiple opinions are valid. Rather, this line of work ( [38,58]) suggests predicting the distribution of human opinions, rather than the majority vote. One implication that follows is that individual annotator performance becomes more important, since one cannot aggregate away labeling error using a simple majority vote.…”

Section: Related Workmentioning

confidence: 99%

“…We evaluate the performance of workers against the ground-truth labels ( §3.3). Majority labels are often computed to mitigate labeling error [52], but recent work has also shown the utility of high-quality individual annotations in order to estimate the distributions of human opinion [58]. The latter is particularly relevant in our setting where workers are labeling often subjective concerns: being able to measure the degrees of concern across individuals is relevant towards reducing vaccine hesitancy.…”

Section: Performance Comparisonmentioning

confidence: 99%

See 1 more Smart Citation

Interface Design for Crowdsourcing Hierarchical Multi-Label Text Annotations

Stureborg,

Dhingra,

Yang

2023

Preprint

View full text Add to dashboard Cite

Human data labeling is an important and expensive task at the heart of supervised learning systems. Hierarchies help humans understand and organize concepts. We ask whether and how concept hierarchies can inform the design of annotation interfaces to improve labeling quality and efficiency. We study this question through annotation of vaccine misinformation, where the labeling task is difficult and highly subjective. We investigate 6 user interface designs for crowdsourcing hierarchical labels by collecting over 18,000 individual annotations. Under a fixed budget, integrating hierarchies into the design improves crowdsource workers' F1 scores. We attribute this to (1) Grouping similar concepts, improving F1 scores by +0.16 over random groupings, (2) Strong relative performance on high-difficulty examples (relative F1 score difference of +0.40), and (3) Filtering out obvious negatives, increasing precision by +0.07. Ultimately, labeling schemes integrating the hierarchy outperform those that do not -achieving mean F1 of 0.70. CCS CONCEPTS• Human-centered computing → HCI design and evaluation methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Performance Comparisonmentioning

confidence: 99%