Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.615
|View full text |Cite
|
Sign up to set email alerts
|

Language Model Prior for Low-Resource Neural Machine Translation

Abstract: The scarcity of large parallel corpora is an important obstacle for neural machine translation. A common solution is to exploit the knowledge of language models (LM) trained on abundant monolingual data. In this work, we propose a novel approach to incorporate a LM as prior in a neural translation model (TM). Specifically, we add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior, while avoiding wrong predictions when the TM "disagrees" with the LM. This ob… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
43
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 45 publications
(47 citation statements)
references
References 34 publications
0
43
0
Order By: Relevance
“…Some explores data symmetry (Freitag and Firat, 2020;Birch et al, 2008;Lin et al, 2019). Zero-shot translation in severely low resource settings exploits the massive multilinguality, cross-lingual transfer, pretraining, iterative backtranslation and freezing subnetworks (Lauscher et al, 2020;Nooralahzadeh et al, 2020;Pfeiffer et al, 2020;Baziotis et al, 2020;Chronopoulou et al, 2020;Lin et al, 2020;Thompson et al, 2018;Luong et al, 2014;Dou et al, 2020).…”
Section: Machine Polyglotism and Pretrainingmentioning
confidence: 99%
“…Some explores data symmetry (Freitag and Firat, 2020;Birch et al, 2008;Lin et al, 2019). Zero-shot translation in severely low resource settings exploits the massive multilinguality, cross-lingual transfer, pretraining, iterative backtranslation and freezing subnetworks (Lauscher et al, 2020;Nooralahzadeh et al, 2020;Pfeiffer et al, 2020;Baziotis et al, 2020;Chronopoulou et al, 2020;Lin et al, 2020;Thompson et al, 2018;Luong et al, 2014;Dou et al, 2020).…”
Section: Machine Polyglotism and Pretrainingmentioning
confidence: 99%
“…We train NMT models using a language model prior, following Baziotis et al (2020). This method allows us to make use of the additional monolingual data we gathered (cf.…”
Section: Language Model Priormentioning
confidence: 99%
“…By reordering target-language sentences into source-language syntactic structure and then mapping target-language words into source-language words with a dictionary, the size of parallel data is enlarged and translation performance is improved. Baziotis et al (2020) leverage a language model to help enhance the performance of the translation model. Similar to the idea of knowledge distillation (Hinton et al, 2015), a teacher model and a student model are trained where the language model plays the role of teacher and translation model plays the role of student.…”
Section: Low-resource Mtmentioning
confidence: 99%