Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-1176
|View full text |Cite
|
Sign up to set email alerts
|

A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric Interpretation

Abstract: We present a corpus of anaphoric information (coreference) crowdsourced through a gamewith-a-purpose. The corpus, containing annotations for about 108,000 markables, is one of the largest corpora for coreference for English, and one of the largest crowdsourced NLP corpora, but its main feature is the large number of judgments per markable: 20 on average, and over 2.2M in total. This characteristic makes the corpus a unique resource for the study of disagreements on anaphoric interpretation. A second distinctiv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
48
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 35 publications
(50 citation statements)
references
References 41 publications
2
48
0
Order By: Relevance
“…Prior work examines how writer intentions are often misaligned with reader perceptions (Chang et al, 2020), which further motivates our focus on the reader (our annotator). While our work focuses on subjectivity, ambiguity is studied in many NLP tasks, including Natural Language Inference (Pavlick and Kwiatkowski, 2019;Nie et al, 2020), evaluation of NLG (Schoch et al, 2020), a recent SemEval 2021 shared task, 1 as well as several discourse tasks (Asher and Lascarides, 2003;Versley, 2011;Webber and Joshi, 2012;Das et al, 2017;Poesio et al, 2019;Webber et al, 2019). Only one study strives to understand how these ambiguities are resolved: Scholman (2019) shows different interpretations of ambiguous coherence relations can be attributable to different cognitive biases.…”
Section: Zuckerbergmentioning
confidence: 99%
See 1 more Smart Citation
“…Prior work examines how writer intentions are often misaligned with reader perceptions (Chang et al, 2020), which further motivates our focus on the reader (our annotator). While our work focuses on subjectivity, ambiguity is studied in many NLP tasks, including Natural Language Inference (Pavlick and Kwiatkowski, 2019;Nie et al, 2020), evaluation of NLG (Schoch et al, 2020), a recent SemEval 2021 shared task, 1 as well as several discourse tasks (Asher and Lascarides, 2003;Versley, 2011;Webber and Joshi, 2012;Das et al, 2017;Poesio et al, 2019;Webber et al, 2019). Only one study strives to understand how these ambiguities are resolved: Scholman (2019) shows different interpretations of ambiguous coherence relations can be attributable to different cognitive biases.…”
Section: Zuckerbergmentioning
confidence: 99%
“…Discourse, like many uses of language, has inherent ambiguity, meaning it can have multiple, valid interpretations. Much work has focused on characterizing these "genuine disagreements" (Asher and Lascarides, 2003;Das et al, 2017;Poesio et al, 2019;Webber et al, 2019) and incorporating their uncertainty through concurrent labels (Rohde et al, 2018) and underspecified structures (Hanneforth et al, 2003). However, prior work does not examine the subjectivity of discourse: how you resolve an ambiguity by applying your personal beliefs and preferences.…”
Section: Introductionmentioning
confidence: 99%
“…The PD corpus was created using the Phrase Detectives game, whose players are asked to find the antecedent/split-antecedents closest to the mention in question (Poesio et al, 2019). The corpus comes with all raw annotations and silver labels aggregated using the Mention-Pair Annotation model (Paun et al, 2018).…”
Section: Auxiliary Corporamentioning
confidence: 99%
“…We evaluated four different augmentation settings. Two of these involve using additional examples of split antecedent anaphora recoverable from the crowdsourced Phrase Detectives corpus (PD) (Poesio et al, 2019), a corpus of anaphoric annotations, including split-antecedent anaphors, collected using the Phrase Detectives game. 2 The corpus includes both raw annotations and silver labels aggregated using the Mention Pair Annotations model (Paun et al, 2018).…”
mentioning
confidence: 99%
“…Recent annotation studies recognize that ambiguity, vagueness and varying degrees of difficulty are inherent to semantic phenomena Erk et al, 2003;Kairam and Heer, 2016;Poesio et al, 2019;Pavlick and Kwiatkowski, 2019). Pavlick and Kwiatkowski (2019) demonstrate that the fundamental task of Natural Language Inferencing contains large proportions of instances with multiple valid interpretations and argue that this phenomenon is central to the task rather than an aspect which can be disregarded.…”
Section: Related Workmentioning
confidence: 99%