Proceedings of the Tenth Workshop on Statistical Machine Translation 2015
DOI: 10.18653/v1/w15-3059
|View full text |Cite
|
Sign up to set email alerts
|

How do Humans Evaluate Machine Translation

Abstract: In this paper, we take a closer look at the MT evaluation process from a glass-box perspective using eye-tracking. We analyze two aspects of the evaluation task -the background of evaluators (monolingual or bilingual) and the sources of information available, and we evaluate them using time and consistency as criteria. Our findings show that monolinguals are slower but more consistent than bilinguals, especially when only target language information is available. When exposed to various sources of information,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0
1

Year Published

2016
2016
2022
2022

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 12 publications
(6 citation statements)
references
References 12 publications
0
3
0
1
Order By: Relevance
“…Pastaruoju atveju dažniausiai vertimas pateikiamas dvikalbiams vertintojams, mokantiems ir originalo, ir vertimo kalbą, kad jie suteiktų konkrečiam segmentui kokybės balą, pvz., nuo 1 = prastas iki 5 = puikus (žr. Guzmán et al 2015). Paprastai naudojami šie kriterijai: adekvatumas, t. y. prasmės išsaugojimas; sklandumas, t. y. gramatiškumas; bendra kokybė (pagrįsta abiejų kriterijų deriniu) ir numatomos postredagavimo kognityvinės pastangos (Popović 2018).…”
Section: Kokybės Vertinimasunclassified
“…Pastaruoju atveju dažniausiai vertimas pateikiamas dvikalbiams vertintojams, mokantiems ir originalo, ir vertimo kalbą, kad jie suteiktų konkrečiam segmentui kokybės balą, pvz., nuo 1 = prastas iki 5 = puikus (žr. Guzmán et al 2015). Paprastai naudojami šie kriterijai: adekvatumas, t. y. prasmės išsaugojimas; sklandumas, t. y. gramatiškumas; bendra kokybė (pagrįsta abiejų kriterijų deriniu) ir numatomos postredagavimo kognityvinės pastangos (Popović 2018).…”
Section: Kokybės Vertinimasunclassified
“…Human evaluation of text quality: Most previous studies on human evaluation concentrate on constrained generation domains, such as machine translation (Guzmán et al, 2015;Graham et al, 2017;Toral et al, 2018;Castilho, 2021) or summarization (Gillick and Liu, 2010;Iskender et al, 2020). Other studies evaluate very short, often one sentence long, outputs (Grundkiewicz et al, 2015;Mori et al, 2019;Khashabi et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…Research in manual evaluation has focused on overcoming annotator bias, i.e. the preferences and expectations of individual annotators with respect to translation quality that lead to low levels of inter-annotator agreement (Cohn and Specia, 2013;Denkowski and Lavie, 2010;Graham et al, 2013;Guzmán et al, 2015). The problem of reference bias, however, has not been examined in previous work.…”
Section: Related Workmentioning
confidence: 99%