Proceedings of the Third Conference on Machine Translation: Shared Task Papers 2018
DOI: 10.18653/v1/w18-6471
|View full text |Cite
|
Sign up to set email alerts
|

Multi-source transformer with combined losses for automatic post editing

Abstract: Recent approaches to the Automatic Postediting (APE) of Machine Translation (MT) have shown that best results are obtained by neural multi-source models that correct the raw MT output by also considering information from the corresponding source sentence. To this aim, we present for the first time a neural multi-source APE model based on the Transformer architecture. Moreover, we employ sequence-level loss functions in order to avoid exposure bias during training and to be consistent with the automatic evaluat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
33
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 21 publications
(33 citation statements)
references
References 16 publications
0
33
0
Order By: Relevance
“…The other approach exploits a retrieval-based method similar to (Farajian et al, 2017): given a query containing the source and the MT output to be post-edited, it: i) retrieves similar triplets from the training data, ii) ranks them based on the sentence level BLEU score between the MT output and the post-edit, and iii) creates the token based on the TER computed between the MT output and the post-edit of the most similar triplet. The backbone architecture is the multi-source extension of Transformer (Vaswani et al, 2017) described in (Tebbifakhr et al, 2018), which is trained both on the task data and on the available artificial corpora.…”
Section: Participantsmentioning
confidence: 99%
“…The other approach exploits a retrieval-based method similar to (Farajian et al, 2017): given a query containing the source and the MT output to be post-edited, it: i) retrieves similar triplets from the training data, ii) ranks them based on the sentence level BLEU score between the MT output and the post-edit, and iii) creates the token based on the TER computed between the MT output and the post-edit of the most similar triplet. The backbone architecture is the multi-source extension of Transformer (Vaswani et al, 2017) described in (Tebbifakhr et al, 2018), which is trained both on the task data and on the available artificial corpora.…”
Section: Participantsmentioning
confidence: 99%
“…SOTA APE approaches tackle the task with the multi-source transformer architectures (Junczys-Dowmunt and Grundkiewicz, 2018;Tebbifakhr et al, 2018). Two encoders encode the source and the raw MT, respectively, and an additional targetsource multi-head attention component is stacked on top of the original target-source multi-head attention component.…”
Section: Related Workmentioning
confidence: 99%
“…7 We see that the performance of our copycat systems depends on the difficulty of the task: we observe an improvement for Latvian (+1 absolute HTER point improvement) with the lowest baseline quality (HTER= 0.29), and minor improvements for German with a higher baseline quality (on average 0.14 HTER). As a comparison, the best performing SOTA systems achieve only up to 0.4 HTER improvement for the German dataset and this is by using millions of training data, combined losses and ensembling techniques, whereas we use minimum external resources (Tebbifakhr et al, 2018;Junczys-Dowmunt and Grundkiewicz, 2018). 8 For EN-DE, our PNT-TRG model performs a small number of corrections (mostly accurate) to the outputs: only 50 sentences, compared to the 200-300 sentences modified by the SOTA EN-DE APE systems .…”
Section: Apementioning
confidence: 99%
“…They introduce an additional joint encoder that attends over a combination of the two encoded sequences from mt and src. Tebbifakhr et al (2018), the NMT-subtask winner of WMT 2018 (wmt18 nmt best ), employ sequence-level loss functions in order to avoid exposure bias during training and to be consistent with the automatic evaluation metrics. Shin and Lee (2018) propose that each encoder has its own self-attention and feed-forward layer to process each input separately.…”
Section: Related Researchmentioning
confidence: 99%