Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.101
|View full text |Cite
|
Sign up to set email alerts
|

Improving Zero-Shot Translation by Disentangling Positional Information

Abstract: Multilingual neural machine translation has shown the capability of directly translating between language pairs unseen in training, i.e. zero-shot translation. Despite being conceptually attractive, it often suffers from low output quality. The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training. We demonstrate that a main factor causing the language-specific representations is the positional correspondence to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 23 publications
0
10
0
Order By: Relevance
“…We regard T-ENC as a strong baseline which has been demonstrated better for zero-shot translation . For comparison, we also include residual removed models (Liu et al, 2021) with S-ENC-T-DEC tagging, which are also claimed effective for zero-shot translation. For multilingual translation models, we train a Transformer-big model with 1840K tokens per batch for 50K updates.…”
Section: Multi-source Test Setmentioning
confidence: 99%
“…We regard T-ENC as a strong baseline which has been demonstrated better for zero-shot translation . For comparison, we also include residual removed models (Liu et al, 2021) with S-ENC-T-DEC tagging, which are also claimed effective for zero-shot translation. For multilingual translation models, we train a Transformer-big model with 1840K tokens per batch for 50K updates.…”
Section: Multi-source Test Setmentioning
confidence: 99%
“…In this experiment, we include positional disentangled encoder as a comparison since it is reported to improve zero-shot translation (Liu et al, 2020a;. Specifically, we remove the residual connection at the 23 th (penultimate) encoder layer at the second training stage as suggested by .…”
Section: Effect Of Positional Disentangled Encodermentioning
confidence: 99%
“…It reduces the strong positional correspondence between the input text and the output encoder representation brought by residual connections, thus facilitating the cross-lingual transfer. We refer the readers to Liu et al (2020a); for more details.…”
Section: Effect Of Positional Disentangled Encodermentioning
confidence: 99%
“…One line of research believes that the success of zero-shot translation depends on the ability of the model to learn language invariant features, or an interlingua, for cross-lingual transfer (Arivazhagan et al, 2019a;Ji et al, 2020;Liu et al, 2021). Arivazhagan et al (2019a) design auxiliary losses on the NMT encoder that impose representational invariance across languages.…”
Section: Introductionmentioning
confidence: 99%
“…Arivazhagan et al (2019a) design auxiliary losses on the NMT encoder that impose representational invariance across languages. Ji et al (2020) build up a universal encoder for different languages via bridge language model pre-training, while Liu et al (2021) disentangle positional information in multilingual NMT to obtain language-agnostic representations. Besides, Gu et al (2019) point out that the conventional multilingual NMT model heavily captures spurious correlations between the output language and language invariant semantics due to the maximum likelihood training objective, making it hard to generate a reasonable translation in an unseen language.…”
Section: Introductionmentioning
confidence: 99%