2022
DOI: 10.7557/12.6348
|View full text |Cite
|
Sign up to set email alerts
|

All that glitters...

Abstract: Evaluation has emerged as a central concern in natural language processing (NLP) over the last few decades. Evaluation is done against a gold standard, a manually linguistically annotated dataset, which is assumed to provide the ground truth against which the accuracy of the NLP system can be assessed automatically. In this article, some methodological questions in connection with the creation of gold standard datasets are discussed, in particular (non-)expectations of linguistic expertise in annotators and th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 37 publications
0
1
0
Order By: Relevance
“…This is part of a larger set of issues increasingly discussed in data‐driven NLP with regard to how linguistic data are annotated, such as the role of expert knowledge versus native‐speaker intuition (and perhaps more importantly: who counts as an expert), how to deal with variation in the annotations, and how to take factors such as these into account when setting up annotation tasks, as well as for calculating both IAA and machine learning accuracy (e.g. Babarczy et al., 2006; Bayerl & Paul, 2011; Borin, 2022; Gillick & Liu, 2010; Plank, 2022; Plank et al., 2014; Uma et al., 2021).…”
Section: Summary: Issues and Perspectivesmentioning
confidence: 99%
“…This is part of a larger set of issues increasingly discussed in data‐driven NLP with regard to how linguistic data are annotated, such as the role of expert knowledge versus native‐speaker intuition (and perhaps more importantly: who counts as an expert), how to deal with variation in the annotations, and how to take factors such as these into account when setting up annotation tasks, as well as for calculating both IAA and machine learning accuracy (e.g. Babarczy et al., 2006; Bayerl & Paul, 2011; Borin, 2022; Gillick & Liu, 2010; Plank, 2022; Plank et al., 2014; Uma et al., 2021).…”
Section: Summary: Issues and Perspectivesmentioning
confidence: 99%