Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.268
|View full text |Cite
|
Sign up to set email alerts
|

Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection

Abstract: This paper considers the unsupervised domain adaptation problem for neural machine translation (NMT), where we assume the access to only monolingual text in either the source or target language in the new domain. We propose a cross-lingual data selection method to extract in-domain sentences in the missing language side from a large generic monolingual corpus. Our proposed method trains an adaptive layer on top of multilingual BERT by contrastive learning to align the representation between the source and targ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 30 publications
0
4
0
Order By: Relevance
“…In summary, the combination of different methods is the key to obtaining high-quality alignments, see for example Fei, Zhang, and Ji (2020), Steingrimsson, Loftsson, and Way (2021), Vu et al (2021). It is a crucial point for the creation of interlinear glossed biblical texts to really understand the detailed concepts of the languages and either use large training corpora and supervise the results of these methods, for example, by manually curating the texts.…”
Section: Related Workmentioning
confidence: 99%
“…In summary, the combination of different methods is the key to obtaining high-quality alignments, see for example Fei, Zhang, and Ji (2020), Steingrimsson, Loftsson, and Way (2021), Vu et al (2021). It is a crucial point for the creation of interlinear glossed biblical texts to really understand the detailed concepts of the languages and either use large training corpora and supervise the results of these methods, for example, by manually curating the texts.…”
Section: Related Workmentioning
confidence: 99%
“…Cross-Lingual Data Selection Vu et al (2021) proposed a generalized unsupervised domain adaptation technique (GUDA) for NMT where only monolingual data from either the source or target language is available in the new domain. A cross-lingual data selection method is introduced to select relevant in-domain sentences from a large monolingual corpus for the language without in-domain data.…”
Section: Unsupervised Domain Adaptation Withmentioning
confidence: 99%
“…This suggests a promising avenue for future research on using synthetically generate monolingual data to improve MT for specialized domains where even monolingual data is scarce. Furthermore, Vu et al (2021a) suggest that one can leverage a retrieval-based to obtain monolingual sentences from the generic data stores. This retrieved monolingual data is then employed to improve the translation quality in a domain adaptation setting.…”
Section: Limitationsmentioning
confidence: 99%