Multi-task Learning for Multilingual Neural Machine Translation

Wang, Yiren; Zhai, ChengXiang; Hassan, Hany

doi:10.18653/v1/2020.emnlp-main.75

Cited by 40 publications

(25 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This warns us that methods which produce strong encoders for NLU might not necessarily improve encoders for NMT. Note that, Siddhant et al (2020); Wang et al (2020) have discovered similar surprising results in cross-lingual NLU tasks.…”

Section: Auxiliary Lossessupporting

confidence: 52%

Exploring Unsupervised Pretraining Objectives for Machine Translation

Baziotis¹,

Titov²,

Birch³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT), by drastically reducing the need for large parallel data. Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. In this work, we systematically compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context. We pretrain models with different methods on English↔German, English↔Nepali and English↔Sinhala monolingual data, and evaluate them on NMT. In (semi-) supervised NMT, varying the pretraining objective leads to surprisingly small differences in the finetuned performance, whereas unsupervised NMT is much more sensitive to it. To understand these results, we thoroughly study the pretrained models and verify that they encode and use information in different ways. We conclude that finetuning on parallel data is mostly sensitive to few properties that are shared by most models, such as a strong decoder, in contrast to unsupervised NMT that also requires models with strong cross-lingual abilities.

show abstract

Section: Auxiliary Lossessupporting

confidence: 52%

Exploring Unsupervised Pretraining Objectives for Machine Translation

Baziotis¹,

Titov²,

Birch³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

show abstract

“…Note that the proposed method is slightly different from standard data augmentation (Sennrich et al, 2016a;Fadaee et al, 2017;Fadaee and Monz, 2018; and multiple-task learning (Dong et al, 2015;Kiperwasser and Ballesteros, 2018;Wang et al, 2020) in NMT research. These data augmentation techniques automatically generate pseudo data based on the original training data and then train a model using both original and generated data.…”

Section: Trainingmentioning

confidence: 94%

Fast and Accurate Neural Machine Translation with Translation Memory

He¹,

Huang²,

Cui³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

It is generally believed that a translation memory (TM) should be beneficial for machine translation tasks. Unfortunately, existing wisdom demonstrates the superiority of TMbased neural machine translation (NMT) only on the TM-specialized translation tasks rather than general tasks, with a non-negligible computational overhead. In this paper, we propose a fast and accurate approach to TM-based NMT within the Transformer framework: the model architecture is simple and employs a single bilingual sentence as its TM, leading to efficient training and inference; and its parameters are effectively optimized through a novel training criterion. Extensive experiments on six TM-specialized tasks show that the proposed approach substantially surpasses several strong baselines that use multiple TMs, in terms of BLEU and running time. In particular, the proposed approach also advances the strong baselines on two general tasks (WMT news Zh→En and En→De).

show abstract

“…Zhou et al (2019) propose multi-task training with a denoising objective to improve the robustness of NMT models. Wang et al (2020) show that multi-task learning with two additional denoising tasks on the monolingual data can effectively improve translation quality. Our training strategy can also be viewed as multi-task learning as we train our multilingual model on single-source and bi-source inputs jointly.…”

Section: Related Workmentioning

confidence: 95%

“…En→X We follow the preprocessing steps in Wang et al (2020): we remove duplicated sentence pairs and the pairs with the same source and target sequences from the training corpora and then tokenize all data using SentencePiece (Kudo and Richardson, 2018) with a shared vocabulary size of 64K tokens. Table 2 shows the training data size after preprocessing and the test set for each language pair.…”

Section: Preprocessingmentioning

confidence: 99%

Improving Multilingual Neural Machine Translation with Auxiliary Source Languages

Yin²,

et al. 2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

Multilingual neural machine translation models typically handle one source language at a time. However, prior work has shown that translating from multiple source languages improves translation quality. Different from existing approaches on multi-source translation that are limited to the test scenario where parallel source sentences from multiple languages are available at inference time, we propose to improve multilingual translation in a more common scenario by exploiting synthetic source sentences from auxiliary languages. We train our model on synthetic multi-source corpora and apply random masking to enable flexible inference with single-source or bi-source inputs. Extensive experiments on Chinese/English→Japanese and a large-scale multilingual translation benchmark show that our model outperforms the multilingual baseline significantly by up to +4.0 BLEU with the largest improvements on low-resource or distant language pairs.

show abstract

Multi-task Learning for Multilingual Neural Machine Translation

Cited by 40 publications

References 35 publications

Exploring Unsupervised Pretraining Objectives for Machine Translation

Exploring Unsupervised Pretraining Objectives for Machine Translation

Fast and Accurate Neural Machine Translation with Translation Memory

Improving Multilingual Neural Machine Translation with Auxiliary Source Languages

Contact Info

Product

Resources

About