Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-1121
|View full text |Cite
|
Sign up to set email alerts
|

Consistency by Agreement in Zero-Shot Neural Machine Translation

Abstract: Generalization and reliability of multilingual translation often highly depend on the amount of available parallel data for each language pair of interest. In this paper, we focus on zero-shot generalization-a challenging setup that tests models on translation directions they have not been optimized for at training time.To solve the problem, we (i) reformulate multilingual translation as probabilistic inference, (ii) define the notion of zero-shot consistency and show why standard training often results in mod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
24
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 48 publications
(26 citation statements)
references
References 47 publications
1
24
0
1
Order By: Relevance
“…and enable zero-shot translation (i.e. direct translation between a language pair never seen in training) (Firat et al, 2016b;Johnson et al, 2017;Al-Shedivat and Parikh, 2019;Gu et al, 2019). Despite these potential benefits, multilingual NMT tends to underperform its bilingual counterparts (Johnson et al, 2017;Arivazhagan et al, 2019b) and results in considerably worse translation performance when many languages are accommodated (Aharoni et al, 2019).…”
Section: Sourcementioning
confidence: 99%
“…and enable zero-shot translation (i.e. direct translation between a language pair never seen in training) (Firat et al, 2016b;Johnson et al, 2017;Al-Shedivat and Parikh, 2019;Gu et al, 2019). Despite these potential benefits, multilingual NMT tends to underperform its bilingual counterparts (Johnson et al, 2017;Arivazhagan et al, 2019b) and results in considerably worse translation performance when many languages are accommodated (Aharoni et al, 2019).…”
Section: Sourcementioning
confidence: 99%
“…Arivazhagan et al (2019) addressed the zeroshot generalization problem that some translation directions have not been optimized well due to a lack of parallel data. Al-Shedivat and Parikh (2019) introduced a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in zero-shot translation, which share similarities with our RAT approach. However, in terms of a specific implementation, because of the differences between UNMT and NMT, we have provided three new UNMT methods, and have alleviated the problem of uncontrollable intermediate BT quality in UNMT.…”
Section: Related Workmentioning
confidence: 99%
“…In the binary latent variable setting, straight-through estimators are often used (Dong et al, 2019). Another choice is "continuous decoding" which takes a convex combination of latent values to make the loss differentiable (Al-Shedivat and Parikh, 2019). Yet a less considered choice is Hard EM (Brown et al, 1993;De Marcken, 1995;Spitkovsky et al, 2010).…”
Section: Related Workmentioning
confidence: 99%