RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning 2017
DOI: 10.26615/978-954-452-049-6_050
|View full text |Cite
|
Sign up to set email alerts
|

Curriculum Learning and Minibatch Bucketing in Neural Machine Translation

Abstract: We examine the effects of particular orderings of sentence pairs on the on-line training of neural machine translation (NMT). We focus on two types of such orderings: (1) ensuring that each minibatch contains sentences similar in some aspect and (2) gradual inclusion of some sentence types as the training progresses (so called "curriculum learning"). In our English-to-Czech experiments, the internal homogeneity of minibatches has no effect on the training but some of our "curricula" achieve a small improvement… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
146
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 127 publications
(148 citation statements)
references
References 16 publications
2
146
0
Order By: Relevance
“…Wang et al (2018b) define noise level and introduce a denoising curriculum. Kocmi and Bojar (2017) use linguistically-motivated features to classify examples into bins for scheduling. use reinforcement learning to learn a denoising curriculum based on noise level of examples.…”
Section: Curriculum Learning For Nmtmentioning
confidence: 99%
“…Wang et al (2018b) define noise level and introduce a denoising curriculum. Kocmi and Bojar (2017) use linguistically-motivated features to classify examples into bins for scheduling. use reinforcement learning to learn a denoising curriculum based on noise level of examples.…”
Section: Curriculum Learning For Nmtmentioning
confidence: 99%
“…The idea of a curriculum was popularized by Bengio et al (2009), who viewed it as a way to improve convergence by presenting heuristicallyidentified easy examples first. Two recent papers (Kocmi and Bojar, 2017;Zhang et al, 2018) explore similar ideas for NMT, and verify that this strategy can reduce training time and improve quality.…”
Section: Related Workmentioning
confidence: 94%
“…4) The performances of Kocmi and Bojar (2017) and Zhang et al (2017) decreased significantly after reaching the highest BLEU. This is consistent with the hypothesis that NMT may forget the learned knowledge by directly removing corresponding sentences.…”
Section: Training Efficiencymentioning
confidence: 95%
“…Beside the PBSMT (Koehn et al, 2007) and vanilla NMT, three typical existing approaches described in the introduction were empirically compared: 1) Curriculum learning using the source sentence length as the criterion (Kocmi and Bojar, 2017). 2) Gradual fine-tuning using language model-based cross-entropy (Wees et al, 2017) 5 .…”
Section: Baselines and Settingsmentioning
confidence: 99%