Findings of the Association for Computational Linguistics: EMNLP 2023 2023
DOI: 10.18653/v1/2023.findings-emnlp.859
|View full text |Cite
|
Sign up to set email alerts
|

Code-Switching with Word Senses for Pretraining in Neural Machine Translation

Vivek Iyer,
Edoardo Barba,
Alexandra Birch
et al.

Abstract: Lexical ambiguity is a significant and pervasive challenge in Neural Machine Translation (NMT), with many state-of-the-art (SOTA) NMT systems struggling to handle polysemous words (Campolungo et al., 2022a). The same holds for the NMT pretraining paradigm of denoising synthetic "code-switched" text (Pan et al., 2021;Iyer et al., 2023), where word senses are ignored in the noising stage -leading to harmful sense biases in the pretraining data that are subsequently inherited by the resulting models. In this work… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 36 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?