Proceedings of the Third Conference on Machine Translation: Research Papers 2018
DOI: 10.18653/v1/w18-6312
|View full text |Cite
|
Sign up to set email alerts
|

Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation

Abstract: We reassess a recent study (Hassan et al., 2018) that claimed that machine translation (MT) has reached human parity for the translation of news from Chinese into English, using pairwise ranking and considering three variables that were not taken into account in that previous study: the language in which the source side of the test set was originally written, the translation proficiency of the evaluators, and the provision of inter-sentential context. If we consider only original source text (i.e. not translat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

4
141
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 148 publications
(146 citation statements)
references
References 16 publications
4
141
0
1
Order By: Relevance
“…We first confirmed that the results are not due to the original language of the reference sentences being English in half of the evaluated sentences and Czech in the other half of the test dataset ( Supplementary Fig. 4; Methods 13), which was proposed to be a potential confounding factor by the WMT organizers 17 and others 22,23 .…”
Section: Resultssupporting
confidence: 60%
“…We first confirmed that the results are not due to the original language of the reference sentences being English in half of the evaluated sentences and Czech in the other half of the test dataset ( Supplementary Fig. 4; Methods 13), which was proposed to be a potential confounding factor by the WMT organizers 17 and others 22,23 .…”
Section: Resultssupporting
confidence: 60%
“…Our work empirically strengthens and extends the recommendations on human MT evaluation in previous work (Läubli et al, 2018;Toral, Castilho, et al, 2018), some of which have meanwhile been adopted by the large-scale evaluation campaign at WMT 2019 (Barrault et al, 2019): the new evaluation protocol uses original source texts only (R5) and gives raters access to document-level context (R2). The findings of WMT 2019 provide further evidence in support of our recommendations.…”
Section: Recommendationssupporting
confidence: 68%
“…Neural Machine Translation (NMT) has provided impressive advances in translation quality, leading to a discussion whether translations produced by professional human translators can still be distinguished from the output of NMT systems, and to what extent automatic evaluation measures can reliably account for these differences (Hassan Awadalla et al, 2018;Läubli et al, 2018;Toral et al, 2018). One answer to this question lies in the development of so-called test suites (Burchardt et al, 2017) or challenge sets (Isabelle et al, 2017) that focus on particular linguistic phenomena that are known to be difficult to evaluate with simple reference-based metrics such as BLEU.…”
Section: Introductionmentioning
confidence: 99%