Multilingual Neural Machine Translation with Language Clustering

Tan, Xu; Chen, Jiale; He, Di; Xia, Yingce; Qin, Tao; Liu, Tie-Yan

doi:10.18653/v1/d19-1089

Cited by 167 publications

(189 citation statements)

References 25 publications

Supporting

Mentioning

184

Contrasting

Order By: Relevance

“…Our conclusions are similar to that of works that have attempted to cluster learned language vectors:Östling and Tiedemann (2016); Tan et al (2019) both find that hierarchical clusters of language vectors discover linguistic similarity, with the former finding fine-grained clusterings for Germanic languages. In a similar vein, Tiedemann (2018) visualizes language vectors and find that they roughly cluster by linguistic family.…”

Section: Representations Cluster By Language Similaritysupporting

confidence: 89%

“…Recent work on interpretability for NLU tasks uses methods such as diagnostic tasks (Belinkov et al, 2017;Tenney et al, 2019;Belinkov et al, 2018), attention based methods (Raganato and Tiedemann, 2018) or task analysis (Zhang and Bowman, 2018) and is primarily focused on understanding the linguistic features encoded by a trained model. Some recent works compare learned language vectors (Östling and Tiedemann, 2016;Tan et al, 2019;Tiedemann, 2018) and find conclusions similar to ours. To the best of our knowledge, we are the first to compare the hidden representations of the sentences themselves.…”

Section: Svcca For Sequencessupporting

confidence: 83%

See 1 more Smart Citation

Investigating Multilingual NMT Representations at Scale

Kudugunta

Bapna

Caswell

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Multilingual Neural Machine Translation (NMT) models have yielded large empirical success in transfer learning settings. However, these black-box representations are poorly understood, and their mode of transfer remains elusive. In this work, we attempt to understand massively multilingual NMT representations (with 103 languages) using Singular Value Canonical Correlation Analysis (SVCCA), a representation similarity framework that allows us to compare representations across different languages, layers and models. Our analysis validates several empirical results and long-standing intuitions, and unveils new observations regarding how representations evolve in a multilingual translation model. We draw three major conclusions from our analysis, with implications on cross-lingual transfer learning: (i) Encoder representations of different languages cluster based on linguistic similarity, (ii) Representations of a source language learned by the encoder are dependent on the target language, and vice-versa, and (iii) Representations of high resource and/or linguistically similar languages are more robust when fine-tuning on an arbitrary language pair, which is critical to determining how much cross-lingual transfer can be expected in a zero or few-shot setting. We further connect our findings with existing empirical observations in multilingual NMT and transfer learning.

show abstract

Section: Representations Cluster By Language Similaritysupporting

confidence: 89%

Section: Svcca For Sequencessupporting

confidence: 83%

Investigating Multilingual NMT Representations at Scale

Kudugunta

Bapna

Caswell

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…We expect that the balance between language-agnostic and language-specific representations should depend on the language pairs. Prasanna [117], Tan et al [141] are some of the works that cluster languages into language families and train separate MNMT models per family. Language families can be decided by using linguistic knowledge 7 [117] or by using embedding similarities where the embeddings are obtained from a multilingual word2vec model [141].…”

Section: Addressing Language Divergencementioning

confidence: 99%

A Survey of Multilingual Neural Machine Translation

2020

View full text Add to dashboard Cite

We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in recent years. MNMT has been useful in improving translation quality as a result of translation knowledge transfer (transfer learning). MNMT is more promising and interesting than its statistical machine translation counterpart, because end-to-end modeling and distributed representations open new avenues for research on machine translation. Many approaches have been proposed to exploit multilingual parallel corpora for improving translation quality. However, the lack of a comprehensive survey makes it difficult to determine which approaches are promising and, hence, deserve further exploration. In this article, we present an indepth survey of existing literature on MNMT. We first categorize various approaches based on their central use-case and then further categorize them based on resource scenarios, underlying modeling principles, coreissues, and challenges. Wherever possible, we address the strengths and weaknesses of several techniques by comparing them with each other. We also discuss the future directions for MNMT. This article is aimed towards both beginners and experts in NMT. We hope this article will serve as a starting point as well as a source of new ideas for researchers and engineers interested in MNMT.

show abstract

“…In reinforcement learning, knowledge distillation has been used to regularize multi-task agents (Parisotto et al, 2016;Teh et al, 2017). In NLP, Tan et al (2019) distill singlelanguage-pair machine translation systems into a many-language system. However, they focus on multilingual rather than multi-task learning, use a more complex training procedure, and only experiment with Single→Multi distillation.…”

Section: Related Workmentioning

confidence: 99%

BAM! Born-Again Multi-Task Networks for Natural Language Understanding

Clark¹,

Luong²,

Khandelwal³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

188

170

View full text Add to dashboard Cite

It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts. To help address this, we propose using knowledge distillation where single-task models teach a multi-task model. We enhance this training with teacher annealing, a novel method that gradually transitions the model from distillation to supervised learning, helping the multi-task model surpass its single-task teachers. We evaluate our approach by multi-task fine-tuning BERT on the GLUE benchmark. Our method consistently improves over standard single-task and multi-task training.

show abstract

Multilingual Neural Machine Translation with Language Clustering

Cited by 167 publications

References 25 publications

Investigating Multilingual NMT Representations at Scale

Investigating Multilingual NMT Representations at Scale

A Survey of Multilingual Neural Machine Translation

BAM! Born-Again Multi-Task Networks for Natural Language Understanding

Contact Info

Product

Resources

About