Consistency by Agreement in Zero-Shot Neural Machine Translation

Al-Shedivat, Maruan; Parikh, Ankur P.

doi:10.18653/v1/n19-1121

Cited by 48 publications

(26 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…and enable zero-shot translation (i.e. direct translation between a language pair never seen in training) (Firat et al, 2016b;Johnson et al, 2017;Al-Shedivat and Parikh, 2019;Gu et al, 2019). Despite these potential benefits, multilingual NMT tends to underperform its bilingual counterparts (Johnson et al, 2017;Arivazhagan et al, 2019b) and results in considerably worse translation performance when many languages are accommodated (Aharoni et al, 2019).…”

Section: Sourcementioning

confidence: 99%

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Zhang¹,

Williams²,

Titov³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

152

146

View full text Add to dashboard Cite

Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. In this paper, we explore ways to improve them. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures. We identify the off-target translation issue (i.e. translating into a wrong target language) as the major source of the inferior zero-shot performance, and propose random online backtranslation to enforce the translation of unseen training language pairs. Experiments on OPUS-100 (a novel multilingual dataset with 100 languages) show that our approach substantially narrows the performance gap with bilingual models in both oneto-many and many-to-many settings, and improves zero-shot performance by ∼10 BLEU, approaching conventional pivot-based methods. 1

show abstract

Section: Sourcementioning

confidence: 99%

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Zhang¹,

Williams²,

Titov³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

152

146

View full text Add to dashboard Cite

show abstract

“…Arivazhagan et al (2019) addressed the zeroshot generalization problem that some translation directions have not been optimized well due to a lack of parallel data. Al-Shedivat and Parikh (2019) introduced a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in zero-shot translation, which share similarities with our RAT approach. However, in terms of a specific implementation, because of the differences between UNMT and NMT, we have provided three new UNMT methods, and have alleviated the problem of uncontrollable intermediate BT quality in UNMT.…”

Section: Related Workmentioning

confidence: 99%

Reference Language based Unsupervised Neural Machine Translation

Zhao

Wang

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Exploiting a common language as an auxiliary for better translation has a long tradition in machine translation and lets supervised learning-based machine translation enjoy the enhancement delivered by the well-used pivot language in the absence of a source language to target language parallel corpus. The rise of unsupervised neural machine translation (UNMT) almost completely relieves the parallel corpus curse, though UNMT is still subject to unsatisfactory performance due to the vagueness of the clues available for its core back-translation training. Further enriching the idea of pivot translation by extending the use of parallel corpora beyond the source-target paradigm, we propose a new reference language-based framework for UNMT, RUNMT, in which the reference language only shares a parallel corpus with the source, but this corpus still indicates a signal clear enough to help the reconstruction training of UNMT through a proposed reference agreement mechanism. Experimental results show that our methods improve the quality of UNMT over that of a strong baseline that uses only one auxiliary language, demonstrating the usefulness of the proposed reference language-based UNMT and establishing a good start for the community.

show abstract

“…In the binary latent variable setting, straight-through estimators are often used (Dong et al, 2019). Another choice is "continuous decoding" which takes a convex combination of latent values to make the loss differentiable (Al-Shedivat and Parikh, 2019). Yet a less considered choice is Hard EM (Brown et al, 1993;De Marcken, 1995;Spitkovsky et al, 2010).…”

Section: Related Workmentioning

confidence: 99%

Discrete Latent Variable Representations for Low-Resource Text Classification

Jin¹,

Wiseman²,

Stratos³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

While much work on deep latent variable models of text uses continuous latent variables, discrete latent variables are interesting because they are more interpretable and typically more space efficient. We consider several approaches to learning discrete latent variable models for text in the case where exact marginalization over these variables is intractable. We compare the performance of the learned representations as features for lowresource document and sentence classification. Our best models outperform the previous best reported results with continuous representations in these low-resource settings, while learning significantly more compressed representations. Interestingly, we find that an amortized variant of Hard EM performs particularly well in the lowest-resource regimes. 1 * Work done as an intern at Toyota Technological Institute at Chicago.

show abstract

Consistency by Agreement in Zero-Shot Neural Machine Translation

Cited by 48 publications

References 47 publications

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Reference Language based Unsupervised Neural Machine Translation

Discrete Latent Variable Representations for Low-Resource Text Classification

Contact Info

Product

Resources

About