Proceedings of the 2008 Conference on Semantics in Text Processing - STEP '08 2008
DOI: 10.3115/1626481.1626511
|View full text |Cite
|
Sign up to set email alerts
|

Addressing the resource bottleneck to create large-scale annotated texts

Abstract: Large-scale linguistically annotated resources have become available in recent years. This is partly due to sophisticated automatic and semiautomatic approaches that work well on specific tasks such as part-ofspeech tagging. For more complex linguistic phenomena like anaphora resolution there are no tools that result in high-quality annotations without massive user intervention. Annotated corpora of the size needed for modern computational linguistics research cannot however be created by small groups of hand … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
6
0
2

Year Published

2010
2010
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 7 publications
0
6
0
2
Order By: Relevance
“…Following the idea of Phrase Detectives (Chamberlain et al, 2008), in the GMB a game with a purpose (GWAP) was introduced to annotate parts of speech, antecedents of pronouns, noun compound relations (Bos and Nissim, 2015), and word senses (Venhuizen et al, 2013). The quality of annotations harvested from gamification was generally high, but the amount of annotations relatively low-it would literally take years to annotate the entire GMB corpus.…”
Section: Be Careful With the Crowdmentioning
confidence: 99%
“…Following the idea of Phrase Detectives (Chamberlain et al, 2008), in the GMB a game with a purpose (GWAP) was introduced to annotate parts of speech, antecedents of pronouns, noun compound relations (Bos and Nissim, 2015), and word senses (Venhuizen et al, 2013). The quality of annotations harvested from gamification was generally high, but the amount of annotations relatively low-it would literally take years to annotate the entire GMB corpus.…”
Section: Be Careful With the Crowdmentioning
confidence: 99%
“…he, she, it ), rather than plural pronouns and pronouns referring to abstract entities. Nonetheless, current efforts of creating large‐scale annotated resources for anaphora, of which the crowdsourcing method presented in Chamberlain et al. (2008) is a key example, will certainly boost performance of machine learning approaches in the near future.…”
Section: Constructing Semantic Representationsmentioning
confidence: 99%
“…As we will see, it is one of the contentions of this article that the collaborative approach to resource creation can also result in a better understanding of the complexity of language interpretation. This work is meant to be the definitive reference article on Phrase Detectives, collecting in a single publication material previously only found in separate papers such as , Chamberlain et al [2008aChamberlain et al [ , 2008bChamberlain et al [ , 2009aChamberlain et al [ , 2009b, and Kruschwitz et al [2009] and additional material not presented before, including a cost comparison between games, traditional annotation, and crowdsourcing, and a discussion of recent developments such as the Facebook version of the game. Our objective is to provide an assessment of the methodology and to summarize the lessons we learned so that other researchers may decide whether this methodology is appropriate for other HLT tasks.…”
Section: Introductionmentioning
confidence: 99%