2014
DOI: 10.1007/s10590-014-9162-z
|View full text |Cite
|
Sign up to set email alerts
|

Data-driven annotation of binary MT quality estimation corpora based on human post-editions

Abstract: Advanced computer-assisted translation (CAT) tools include automatic quality estimation (QE) mechanisms to support post-editors in identifying and selecting useful suggestions. Based on supervised learning techniques, QE relies on highquality data annotations obtained from expensive manual procedures. However, as the notion of MT quality is inherently subjective, such procedures might result in unreliable or uninformative annotations. To overcome these issues, we propose an automatic method to obtain binary an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
8
1
1

Relationship

4
6

Authors

Journals

citations
Cited by 12 publications
(13 citation statements)
references
References 19 publications
0
13
0
Order By: Relevance
“…Based on their postedits, the raw MT output samples are then labeled as 'good' or 'bad' by considering the HTER (Snover et al, 2006) calculated between raw MT output and its post-edited version. 3 Our labeling criterion follows the empirical findings of (Turchi et al, 2013;Turchi et al, 2014b), which indicate an HTER value of 0.4 as boundary between posteditable (HTER ≤ 0.4) and useless suggestions (HTER> 0.4). Then, to model the subjective concept of quality of different subjects, for of each translator we train a separate binary QE classifier on the labeled samples.…”
Section: Getting Binary Quality Labelsmentioning
confidence: 98%
“…Based on their postedits, the raw MT output samples are then labeled as 'good' or 'bad' by considering the HTER (Snover et al, 2006) calculated between raw MT output and its post-edited version. 3 Our labeling criterion follows the empirical findings of (Turchi et al, 2013;Turchi et al, 2014b), which indicate an HTER value of 0.4 as boundary between posteditable (HTER ≤ 0.4) and useless suggestions (HTER> 0.4). Then, to model the subjective concept of quality of different subjects, for of each translator we train a separate binary QE classifier on the labeled samples.…”
Section: Getting Binary Quality Labelsmentioning
confidence: 98%
“…Three different tokens are used, namely "no post-edit" (no edits are required), "light post-edit" (minimal edits are required), and "heavy post-edit" (a large number of edits are required. At training time, the instances are labelled based on the TER computed between the MT output and its post-edited version, with the boundary between light and heavy post-edit set to TER=0.4 based on the findings reported in (Turchi et al, 2013;Turchi et al, 2014). At test time, tokens are predicted with two approaches.…”
Section: Participantsmentioning
confidence: 99%
“…For instance, based on the empirical findings reported in(Turchi et al, 2013(Turchi et al, , 2014, TER=0.4 is the threshold that, for human post-editors, separates the "post-editable" translations from those that require complete rewriting from scratch.…”
mentioning
confidence: 99%