2023
DOI: 10.3384/nejlt.2000-1533.2023.4529
|View full text |Cite
|
Sign up to set email alerts
|

Barriers and enabling factors for error analysis in NLG research

Abstract: Earlier research has shown that few studies in Natural Language Generation (NLG) evaluate their system outputs using an error analysis, despite known limitations of automatic evaluation metrics and human ratings. This position paper takes the stance that error analyses should be encouraged, and discusses several ways to do so. This paper is not just based on our shared experience as authors, but we also distributed a survey as a means of public consultation. We provide an overview of existing barriers to carry… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
12
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(12 citation statements)
references
References 35 publications
0
12
0
Order By: Relevance
“…generation researchers to make more informed design choices. A suitable interface can also encourage researchers to step away from unreliable automatic metrics (Gehrmann et al, 2022) and focus on manual error analysis (van Miltenburg et al, 2021(van Miltenburg et al, , 2023.…”
Section: Web Interfacementioning
confidence: 99%
“…generation researchers to make more informed design choices. A suitable interface can also encourage researchers to step away from unreliable automatic metrics (Gehrmann et al, 2022) and focus on manual error analysis (van Miltenburg et al, 2021(van Miltenburg et al, , 2023.…”
Section: Web Interfacementioning
confidence: 99%
“…This will put linguistic theories to the test: are they concrete enough to operationalise notions like interactive alignment (Rasenberg, Özyürek & Dingemanse 2020) and the co-construction of social action (Sidnell & Enfield 2012)? Empirical work on how people coordinate joint action and deal with misunderstandings shifts from a relatively peripheral topic to a domain of key relevance (Ashktorab et al 2019); likewise, for technology, work on error analysis and the measurement of performance becomes only more urgent ( van Miltenburg et al 2023). With the recent surge in attention surrounding "language models", linguists need to reconsider the technological and theoretical models of language at play.…”
Section: Human-computer Interactionmentioning
confidence: 99%

Reimagining language

Rasenberg,
Amha,
Coler
et al. 2023
AVT
Self Cite
“…However, a large variety of different annotation schemes have been created (Huidrom and Belz, 2022), often task and/or domain-specific, which makes comparison between output annotations and thus incremental progress difficult. A standardised, task-agnostic error annotation taxonomy would not only help in comparing different NLP system outputs for performance analysis, but it would also aid in developing automatic or semiautomatic error metrics for various NLP tasks (van Miltenburg et al, 2021).…”
Section: Introductionmentioning
confidence: 99%