Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop 2019
DOI: 10.18653/v1/p19-2017
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation

Abstract: This work presents our ongoing research of unsupervised pretraining in neural machine translation (NMT). In our method, we initialize the weights of the encoder and decoder with two language models that are trained with monolingual data and then fine-tune the model on parallel data using Elastic Weight Consolidation (EWC) to avoid forgetting of the original language modeling tasks. We compare the regularization by EWC with the previous work that focuses on regularization by language modeling objectives. The po… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(10 citation statements)
references
References 19 publications
0
10
0
Order By: Relevance
“…A stream of research focused on techniques more traditionally associated with CL has also been present. In the works of Miceli Barone et al (2017); ; Variš and Bojar (2019); Thompson et al (2019) regularization approaches (e.g. EWC) were leveraged.…”
Section: Machine Translationmentioning
confidence: 99%
“…A stream of research focused on techniques more traditionally associated with CL has also been present. In the works of Miceli Barone et al (2017); ; Variš and Bojar (2019); Thompson et al (2019) regularization approaches (e.g. EWC) were leveraged.…”
Section: Machine Translationmentioning
confidence: 99%
“…Adapting multilingual model to a new language pair and domain adaptation: Prior works on adaptation (Neubig and Hu, 2018;Variš and Bojar, 2019;Stickland et al, 2020;Escolano et al, 2020;Akella et al, 2020;Zhang et al, 2020) aims at improving language specific performance by either fine-tuning the same MNMT model or adding language specific modules. While being effective, these methods either lose their multilinguality or introduce additional parameters.…”
Section: Related Workmentioning
confidence: 99%
“…Adaptation to a new language pair may be addressed by training a multilingual model then fine-tuning it with parallel data in the language pair of interest (Neubig and Hu, 2018;Variš and Bojar, 2019;Stickland et al, 2020). Escolano et al (2020) propose plug-and-play encoders and decoders per language, which take advantage of a single representation in each language but at the cost of larger model sizes.…”
Section: Related Workmentioning
confidence: 99%