A Simple and Effective Approach to Automatic Post-Editing with Transfer Learning

Correia, Gonçalo M.; Martins, André F. T.

doi:10.18653/v1/p19-1292

Cited by 18 publications

(11 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the contrary, post-editing of system summaries through a set of basic operations such as insertion and deletion (Gu et al, 2019;Malmi et al, 2019;Dong et al, 2019b;Correia and Martins, 2019) may have intrinsic limitations by learning from single reference summaries to produce single outputs. In this paper, we provide a new dataset where each source text is associated with multiple admissible summaries to encourage diverse outputs.…”

Section: Vocabularymentioning

confidence: 99%

A New Approach to Overgenerating and Scoring Abstractive Summaries

Song¹,

Wang²,

Feng³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

We propose a new approach to generate multiple variants of the target summary with diverse content and varying lengths, then score and select admissible ones according to users' needs. Abstractive summarizers trained on single reference summaries may struggle to produce outputs that achieve multiple desirable properties, i.e., capturing the most important information, being faithful to the original, grammatical and fluent. In this paper, we propose a two-staged strategy to generate a diverse set of candidate summaries from the source text in stage one, then score and select admissible ones in stage two. Importantly, our generator gives a precise control over the length of the summary, which is especially well-suited when space is limited. Our selectors are designed to predict the optimal summary length and put special emphasis on faithfulness to the original text. Both stages can be effectively trained, optimized and evaluated. Our experiments on benchmark summarization datasets suggest that this paradigm can achieve state-of-the-art performance.Source Text: A court here Thursday sentenced a 24-year-old man to 10 years in jail after he admitted pummelling his baby son to death to silence him while watching television. Left to Right Generation (1 Summary) Confidence Driven Generation (4 Summaries) Man who Man gets 10 years Man who killed [. . . ] Man who kill the baby gets 10 years Man who killed baby to hear television better gets 10 Man who kill the baby to hear television gets 10 years Man who killed baby to hear television better gets 10 years Man who kill the baby to hear television better gets 10 years

show abstract

Section: Vocabularymentioning

confidence: 99%

A New Approach to Overgenerating and Scoring Abstractive Summaries

Song¹,

Wang²,

Feng³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…On the target side, following (Correia and Martins, 2019) we use a single decoder where the context attention block is initialized with the self attention weights, and all the weights of the self-attention are shared with the respective selfattention weights in the encoder.…”

Section: Bert-based Encoder-decodermentioning

confidence: 99%

“…Following (Correia and Martins, 2019) we adapt the BERT model to the APE task by integrating the model in an encoder-decoder architecture. To this aim we use a single BERT encoder to obtain a joint representation of the src and mt sentence and a BERT-based decoder where the multihead context attention block is initialized with the weights of the self-attention block.…”

Section: Bert-based Encoder-decodermentioning

confidence: 99%

“…Recently, large models pre-trained on multiple tasks with vast amounts of data, for instance BERT and MT-DNN (Devlin et al, 2018a;Liu et al, 2019), have obtained stateof-the-art results when fine-tuned over a small set of training samples. Following Correia and Martins (2019), in this paper we use BERT (Devlin et al, 2018a) within the encoder-decoder framework ( §2.1) and formulate the task of Automatic Post Editing as generating pe which is (possibly) the modified version of mt given the original source sentence src. As discussed in §2.1, instead of using multi-encoder architecture, in this work we concatenate the src and mt with the BERT special token (i.e.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Unbabel’s Submission to the WMT2019 APE Shared Task: BERT-Based Encoder-Decoder for Automatic Post-Editing

Lopes¹,

Farajian²,

Correia³

et al. 2019

Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

Self Cite

View full text Add to dashboard Cite

This paper describes Unbabel's submission to the WMT2019 APE Shared Task for the English-German language pair. Following the recent rise of large, powerful, pretrained models, we adapt the BERT pretrained model to perform Automatic Post-Editing in an encoder-decoder framework. Analogously to dual-encoder architectures we develop a BERT-based encoder-decoder (BED) model in which a single pretrained BERT encoder receives both the source src and machine translation mt strings. Furthermore, we explore a conservativeness factor to constrain the APE system to perform fewer edits. As the official results show, when trained on a weighted combination of in-domain and artificial training data, our BED system with the conservativeness penalty improves significantly the translations of a strong Neural Machine Translation (NMT) system by −0.78 and +1.23 in terms of TER and BLEU, respectively. Finally, our submission achieves a new state-of-the-art, exaequo, in English-German APE of NMT.

show abstract

“…To mitigate the data scarcity, addition of synthetic data to genuine data to expand the training data [6]- [8] has emerged as a possible solution. Especially, eSCAPE [7], a synthetic APE dataset made of parallel corpora, has been used extensively in many studies [2]- [4], [9], [10]. eSCAPE uses parallel corpora composed of bitexts -pairs of a source (src) and a reference (ref), to make a set of synthetic APE triplets: ⟨src, mt, ref ⟩, in which mt is the MT output of src, and ref serves as pe.…”

Section: Introductionmentioning

confidence: 99%

RESHAPE: Reverse-Edited Synthetic Hypotheses for Automatic Post-Editing

et al. 2022

View full text Add to dashboard Cite

Synthetic training data has been extensively used to train Automatic Post-Editing (APE) models in many recent studies because the quantity of human-created data has been considered insufficient. However, the most widely used synthetic APE dataset, eSCAPE, overlooks respecting the minimal editing property of genuine data, and this defect may have been a limiting factor for the performance of APE models. This article suggests adapting back-translation to APE to constrain edit distance, while using stochastic sampling in decoding to maintain diversity of outputs, to create a new synthetic APE dataset, RESHAPE. Our experiments show that (1) RESHAPE contains more samples resembling genuine APE data than eSCAPE does, and (2) using RESHAPE as new training data improves APE models' performance substantially over using eSCAPE.

show abstract

A Simple and Effective Approach to Automatic Post-Editing with Transfer Learning

Cited by 18 publications

References 17 publications

A New Approach to Overgenerating and Scoring Abstractive Summaries

A New Approach to Overgenerating and Scoring Abstractive Summaries

Unbabel’s Submission to the WMT2019 APE Shared Task: BERT-Based Encoder-Decoder for Automatic Post-Editing

RESHAPE: Reverse-Edited Synthetic Hypotheses for Automatic Post-Editing

Contact Info

Product

Resources

About