Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.75
|View full text |Cite
|
Sign up to set email alerts
|

Multi-task Learning for Multilingual Neural Machine Translation

Abstract: While monolingual data has been shown to be useful in improving bilingual neural machine translation (NMT), effectively and efficiently leveraging monolingual data for Multilingual NMT (MNMT) systems is a less explored area. In this work, we propose a multi-task learning (MTL) framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data. We conduct extensive empirical studies on MNMT systems with 10 language pairs from WMT datasets. We show th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
23
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(25 citation statements)
references
References 35 publications
2
23
0
Order By: Relevance
“…This warns us that methods which produce strong encoders for NLU might not necessarily improve encoders for NMT. Note that, Siddhant et al (2020); Wang et al (2020) have discovered similar surprising results in cross-lingual NLU tasks.…”
Section: Auxiliary Lossessupporting
confidence: 52%
“…This warns us that methods which produce strong encoders for NLU might not necessarily improve encoders for NMT. Note that, Siddhant et al (2020); Wang et al (2020) have discovered similar surprising results in cross-lingual NLU tasks.…”
Section: Auxiliary Lossessupporting
confidence: 52%
“…Note that the proposed method is slightly different from standard data augmentation (Sennrich et al, 2016a;Fadaee et al, 2017;Fadaee and Monz, 2018; and multiple-task learning (Dong et al, 2015;Kiperwasser and Ballesteros, 2018;Wang et al, 2020) in NMT research. These data augmentation techniques automatically generate pseudo data based on the original training data and then train a model using both original and generated data.…”
Section: Trainingmentioning
confidence: 94%
“…Zhou et al (2019) propose multi-task training with a denoising objective to improve the robustness of NMT models. Wang et al (2020) show that multi-task learning with two additional denoising tasks on the monolingual data can effectively improve translation quality. Our training strategy can also be viewed as multi-task learning as we train our multilingual model on single-source and bi-source inputs jointly.…”
Section: Related Workmentioning
confidence: 95%
“…En→X We follow the preprocessing steps in Wang et al (2020): we remove duplicated sentence pairs and the pairs with the same source and target sequences from the training corpora and then tokenize all data using SentencePiece (Kudo and Richardson, 2018) with a shared vocabulary size of 64K tokens. Table 2 shows the training data size after preprocessing and the test set for each language pair.…”
Section: Preprocessingmentioning
confidence: 99%