Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2018
DOI: 10.18653/v1/p18-2048
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation

Abstract: Traditional Neural machine translation (NMT) involves a fixed training procedure where each sentence is sampled once during each epoch. In reality, some sentences are well-learned during the initial few epochs; however, using this approach, the well-learned sentences would continue to be trained along with those sentences that were not well learned for 10-30 epochs, which results in a wastage of time. Here, we propose an efficient method to dynamically sample the sentences in order to accelerate the NMT traini… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
10
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 24 publications
(10 citation statements)
references
References 8 publications
0
10
0
Order By: Relevance
“…We explore difficulty criteria based on NMT model scores as well as linguistic properties. We consider a wide range of schedules, based not only on the easy-to-difficult ordering, but also on strategies developed independently from curriculum learning, such as dynamic sampling and boosting (Zhang et al, 2017;van der Wees et al, 2017;Wang et al, 2018).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We explore difficulty criteria based on NMT model scores as well as linguistic properties. We consider a wide range of schedules, based not only on the easy-to-difficult ordering, but also on strategies developed independently from curriculum learning, such as dynamic sampling and boosting (Zhang et al, 2017;van der Wees et al, 2017;Wang et al, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Previous work has focused on dynamic sampling strategies, emphasizing training on samples that are expected to be most useful based on model scores or domain relevance. Inspired by boosting (Schapire, 2002), Zhang et al (2017) Wang et al (2018) improve the training efficiency of NMT by dynamically select different subsets of training data between different epochs. The former performs this dynamic data selection according to domain relevance (Axelrod et al, 2011) while the latter uses the difference between the training costs of two iterations.…”
Section: Introductionmentioning
confidence: 99%
“…In the second strand, a wide variety of methods have been proposed to deal with noise in training data (Schwenk, 2018;Guo et al, 2018;Xu and Koehn, 2017;Koehn et al, 2018;van der Wees et al, 2017;Wang and Neubig, 2019;Wang et al, 2018aWang et al, ,b, 2019.…”
Section: Related Workmentioning
confidence: 99%
“…CL is also used for denoising Wang et al, 2018a,b), and for faster convergence and improved general quality (Zhang et al, 2018;Platanios et al, 2019). Wang et al (2018a) introduce a curriculum for training efficiency. In addition to data sorting/curriculum, instance/loss weighting (Wang et al, 2017;Wang et al, 2019b) has been used as an alternative.…”
Section: Related Workmentioning
confidence: 99%