2021
DOI: 10.48550/arxiv.2110.08547
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation

Abstract: This paper demonstrates that multilingual pretraining, a proper fine-tuning method and a large-scale parallel dataset from multiple auxiliary languages are all critical for zeroshot translation, where the NMT model is tested on source languages unseen during supervised training. Following this idea, we present SixT++, a strong many-to-English NMT model that supports 100 source languages but is trained once with a parallel dataset from only six source languages. SixT++ initializes the decoder embedding and the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 31 publications
0
6
0
Order By: Relevance
“…This is possible, but our work explores a language-agnostic multilingual setting. We refer the interested reader to (Zhang et al, 2022), which finds through an empirical study that overall multilingual translation performance is best when languages are balanced.…”
Section: A Limitationsmentioning
confidence: 99%
“…This is possible, but our work explores a language-agnostic multilingual setting. We refer the interested reader to (Zhang et al, 2022), which finds through an empirical study that overall multilingual translation performance is best when languages are balanced.…”
Section: A Limitationsmentioning
confidence: 99%
“…We evaluate our method on the FLoRes [13] low-resource unsupervised translation tasks on English (En) to and from Nepali (Ne), Sinhala (Si) and Gujarati (Gu) from the Indic family, as well as other low-resource languages: Latvian (Lv) and Estonian (Et) from the Uralic family and Kazakh (Kk) from the Turkic family. Although Hindi (Hi) and Finnish (Fi) are relatively high-resourced compared to their respective siblings, we still add them into the mix for Indic and Uralic families respectively to help the learning process of their respective low-resource siblings, following [12,8].…”
Section: Low-resource Unsupervised Translationmentioning
confidence: 99%
“…Training the model in such a multilingual environment helps the encoder to learn language-agnostic latent representations that are shared across multiple languages, allowing the decoder to translate from any language. The vast availability of high-resource siblings (e.g., Hindi for Indic languages) may also help improve the performance of low-resource MT significantly [12,8].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Training the model in such a multilingual environment helps the encoder learn language-agnostic latent representations that are shared across multiple languages, allowing the decoder to translate from any language. The vast availability of high-resource siblings (e.g., Hindi for Indic languages) may help improve the performance of low-resource MT significantly (Garcia et al, 2021, Chen et al, 2021a.…”
Section: Low-resource and Distant Languagesmentioning
confidence: 99%