Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.80
|View full text |Cite
|
Sign up to set email alerts
|

Self-Paced Learning for Neural Machine Translation

Abstract: Recent studies have proven that the training of neural machine translation (NMT) can be facilitated by mimicking the learning process of humans. Nevertheless, achievements of such kind of curriculum learning rely on the quality of artificial schedule drawn up with the handcrafted features, e.g. sentence length or word rarity. We ameliorate this procedure with a more flexible manner by proposing self-paced learning, where NMT model is allowed to 1) automatically quantify the learning confidence over training ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
26
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 30 publications
(26 citation statements)
references
References 25 publications
0
26
0
Order By: Relevance
“…By heightening the exposure of low-frequency tokens during training, these models can meliorate the neglect of low-frequency tokens and improve lexical diversity of the translations. However, simply promoting low-frequency tokens via loss re-weighting may potentially sacrifice the learning of high-frequency ones (Gu et al 2020;Wan et al 2020;Zhou et al 2020). Besides, our further investigation on these methods reveals that generating more unusual tokens comes at the unexpected expense of their prediction precision (Section 5.4).…”
Section: Introductionmentioning
confidence: 93%
“…By heightening the exposure of low-frequency tokens during training, these models can meliorate the neglect of low-frequency tokens and improve lexical diversity of the translations. However, simply promoting low-frequency tokens via loss re-weighting may potentially sacrifice the learning of high-frequency ones (Gu et al 2020;Wan et al 2020;Zhou et al 2020). Besides, our further investigation on these methods reveals that generating more unusual tokens comes at the unexpected expense of their prediction precision (Section 5.4).…”
Section: Introductionmentioning
confidence: 93%
“…For example, (Cai et al, 2020) proposes an adaptive multi-curricula learning framework to train the dialogue model with easy-to-complex dataset based on various concepts of difficulty including the specificity and repetitiveness of the response, the relevance between the query and the response, etc. Also, Wan et al (2020) resolves this problem by introducing selfpaced learning (Kumar et al, 2010), which is a special kind of curriculum learning (Eppe et al, 2019). Wan et al (2020) measures the level of confidence on each training example, where an easy sample is actually one of high confidence by the current trained model.…”
Section: Introductionmentioning
confidence: 99%
“…Also, Wan et al (2020) resolves this problem by introducing selfpaced learning (Kumar et al, 2010), which is a special kind of curriculum learning (Eppe et al, 2019). Wan et al (2020) measures the level of confidence on each training example, where an easy sample is actually one of high confidence by the current trained model. Both curriculum learning and self-paced learning suggest that samples should be selected in a meaningful order for training.…”
Section: Introductionmentioning
confidence: 99%
“…Neural machine translation (NMT) employs an endto-end framework (Sutskever et al, 2014) and has advanced promising results on various sentencelevel translation tasks (Bahdanau et al, 2015;Gehring et al, 2017;Vaswani et al, 2017;Wan et al, 2020). However, most of NMT models handle sentences independently, regardless of the linguistic context that may appear outside the current sentence (Tiedemann and Scherrer, 2017a).…”
Section: Introductionmentioning
confidence: 99%