Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1078
|View full text |Cite
|
Sign up to set email alerts
|

Iterative Dual Domain Adaptation for Neural Machine Translation

Abstract: Previous studies on the domain adaptation for neural machine translation (NMT) mainly focus on the one-pass transferring out-ofdomain translation knowledge to in-domain NMT model. In this paper, we argue that such a strategy fails to fully extract the domainshared translation knowledge, and repeatedly utilizing corpora of different domains can lead to better distillation of domain-shared translation knowledge. To this end, we propose an iterative dual domain adaptation framework for NMT. Specifically, we first… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 30 publications
(20 citation statements)
references
References 24 publications
0
20
0
Order By: Relevance
“…Ruder et al (2017) introduced this idea as "Knowledge Adaptation," using multi-layer perceptrons to provide sentiment analysis labels for unlabeled indomain data. Similar work includes Iterative Dual Domain Adaptation (Zeng et al, 2019) and Domain Transformation Networks independently showed that knowledge distillation could be used to compress pre-trained models without affecting downstream tasks. Tang et al (2019) showed that task-specific information could be distilled from a large Transformer into a much smaller Bi-directional RNN.…”
Section: Related Workmentioning
confidence: 99%
“…Ruder et al (2017) introduced this idea as "Knowledge Adaptation," using multi-layer perceptrons to provide sentiment analysis labels for unlabeled indomain data. Similar work includes Iterative Dual Domain Adaptation (Zeng et al, 2019) and Domain Transformation Networks independently showed that knowledge distillation could be used to compress pre-trained models without affecting downstream tasks. Tang et al (2019) showed that task-specific information could be distilled from a large Transformer into a much smaller Bi-directional RNN.…”
Section: Related Workmentioning
confidence: 99%
“…Britz et al (2017) adds a discriminator to extract common features across domains. There are also some work (Zeng et al, 2018(Zeng et al, , 2019Gu et al, 2019) that adds domain-specific modules to the model to preserve the domain-specific features. Currey et al (2020) distills multiple expert models into a single student model.…”
Section: Related Workmentioning
confidence: 99%
“…For machine translation task, similar to the settings of Luong and Manning (2015); Wang et al (2017); Zeng et al (2019), we use the IWSLT 2016 English (EN) to German (DE) corpus (Cettolo et al, 2016) as the in-domain data. This corpus contains about 202K sentences from TED talks.…”
Section: Datasetsmentioning
confidence: 99%
“…• IDDA This indicates Iterative Dual Domain Adaptation methods proposed by Zeng et al (2019), which iteratively performs bidirectional translation knowledge transfer using knowledge distillation between in-domain and out-of-domain. Note that this method focuses on the performance of both domains but in this paper we only focus on in-domain performance.…”
Section: Baselinesmentioning
confidence: 99%