Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 2: Short Papers) 2017
DOI: 10.18653/v1/p17-2089
|View full text |Cite
|
Sign up to set email alerts
|

Sentence Embedding for Neural Machine Translation Domain Adaptation

Abstract: Although new corpora are becoming increasingly available for machine translation, only those that belong to the same or similar domains are typically able to improve translation performance. Recently Neural Machine Translation (NMT) has become prominent in the field. However, most of the existing domain adaptation methods only focus on phrase-based machine translation. In this paper, we exploit the NMT's internal embedding of the source sentence and use the sentence embedding similarity to select the sentences… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
66
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 88 publications
(66 citation statements)
references
References 14 publications
0
66
0
Order By: Relevance
“…Meanwhile, applying data weighting into NMT domain adaptation has attracted much attention. Wang et al (2017a) and Wang et al (2017b) proposed several sentence and domain weighting methods with a dynamic weight learning strategy. Zhang et al (2019a) ranked unlabeled domain training samples based on their similarity to in-domain data, and then adopts a probabilistic curriculum learning strategy during training.…”
Section: Related Workmentioning
confidence: 99%
“…Meanwhile, applying data weighting into NMT domain adaptation has attracted much attention. Wang et al (2017a) and Wang et al (2017b) proposed several sentence and domain weighting methods with a dynamic weight learning strategy. Zhang et al (2019a) ranked unlabeled domain training samples based on their similarity to in-domain data, and then adopts a probabilistic curriculum learning strategy during training.…”
Section: Related Workmentioning
confidence: 99%
“…To model texts into vector space, the input tokens are represented as embeddings in deep learning models [28,29,30,45,46,55,57]. Previous work has shown that word representations in NLP tasks can benefit from character-level models, which aim at learning language representations directly from characters.…”
Section: Related Work a Augmented Embeddingmentioning
confidence: 99%
“…Selecting NMT training samples similar to the in-domain data from the out-of-domain parallel corpus has been explored in [13]. The central idea of this work is to use both in-domain and out-of-domain parallel corpora to train an NMT system.…”
Section: Data Selection For Mt Trainingmentioning
confidence: 99%
“…Using each of these approaches, we generate sentence embedding vectors for in-domain (Fin) and out-of-domain (Fout) target side sentences. Along similar lines as [13], data selection is based on the relative distance δs of a sentence vector vs w.r.t. the in-domain and out-of-domain centroids CF in and CF out respectively, indicated by Eq.…”
Section: Data Selection For Mt Trainingmentioning
confidence: 99%