Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation

Wang, Rui; Utiyama, Masao; Sumita, Eiichiro

doi:10.18653/v1/p18-2048

Cited by 24 publications

(10 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We explore difficulty criteria based on NMT model scores as well as linguistic properties. We consider a wide range of schedules, based not only on the easy-to-difficult ordering, but also on strategies developed independently from curriculum learning, such as dynamic sampling and boosting (Zhang et al, 2017;van der Wees et al, 2017;Wang et al, 2018).…”

Section: Introductionmentioning

confidence: 99%

“…Previous work has focused on dynamic sampling strategies, emphasizing training on samples that are expected to be most useful based on model scores or domain relevance. Inspired by boosting (Schapire, 2002), Zhang et al (2017) Wang et al (2018) improve the training efficiency of NMT by dynamically select different subsets of training data between different epochs. The former performs this dynamic data selection according to domain relevance (Axelrod et al, 2011) while the latter uses the difference between the training costs of two iterations.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An Empirical Exploration of Curriculum Learning for Neural Machine Translation

Zhang¹,

Kumar²,

Khayrallah³

et al. 2018

Preprint

View full text Add to dashboard Cite

Machine translation systems based on deep neural networks are expensive to train. Curriculum learning aims to address this issue by choosing the order in which samples are presented during training to help train better models faster. We adopt a probabilistic view of curriculum learning, which lets us flexibly evaluate the impact of curricula design, and perform an extensive exploration on a German-English translation task. Results show that it is possible to improve convergence time at no loss in translation quality. However, results are highly sensitive to the choice of sample difficulty criteria, curriculum schedule and other hyperparameters.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

An Empirical Exploration of Curriculum Learning for Neural Machine Translation

Zhang¹,

Kumar²,

Khayrallah³

et al. 2018

Preprint

View full text Add to dashboard Cite

show abstract

“…In the second strand, a wide variety of methods have been proposed to deal with noise in training data (Schwenk, 2018;Guo et al, 2018;Xu and Koehn, 2017;Koehn et al, 2018;van der Wees et al, 2017;Wang and Neubig, 2019;Wang et al, 2018aWang et al, ,b, 2019.…”

Section: Related Workmentioning

confidence: 99%

Secoco: Self-Correcting Encoding for Neural Machine Translation

Wang¹,

Zhao²,

Wang³

et al. 2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

This paper presents Self-correcting Encoding (Secoco), a framework that effectively deals with input noise for robust neural machine translation by introducing self-correcting predictors. Different from previous robust approaches, Secoco enables NMT to explicitly correct noisy inputs and delete specific errors simultaneously with the translation decoding process. Secoco is able to achieve significant improvements of 1.6 BLEU points over strong baselines on two real-world test sets and a benchmark WMT dataset with good interpretability.The code and dataset are publicly available at https://github.com/rgwt123/Secoco.

show abstract

“…CL is also used for denoising Wang et al, 2018a,b), and for faster convergence and improved general quality (Zhang et al, 2018;Platanios et al, 2019). Wang et al (2018a) introduce a curriculum for training efficiency. In addition to data sorting/curriculum, instance/loss weighting (Wang et al, 2017;Wang et al, 2019b) has been used as an alternative.…”

Section: Related Workmentioning

confidence: 99%

Learning a Multi-Domain Curriculum for Neural Machine Translation

Wang¹,

Tian²,

Ngiam³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Most data selection research in machine translation focuses on improving a single domain. We perform data selection for multiple domains at once. This is achieved by carefully introducing instance-level domain-relevance features and automatically constructing a training curriculum to gradually concentrate on multi-domain relevant and noise-reduced data batches. Both the choice of features and the use of curriculum are crucial for balancing and improving all domains, including out-ofdomain. In large-scale experiments, the multidomain curriculum simultaneously reaches or outperforms the individual performance and brings solid gains over no-curriculum training.

show abstract

Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation

Cited by 24 publications

References 8 publications

An Empirical Exploration of Curriculum Learning for Neural Machine Translation

An Empirical Exploration of Curriculum Learning for Neural Machine Translation

Secoco: Self-Correcting Encoding for Neural Machine Translation

Learning a Multi-Domain Curriculum for Neural Machine Translation

Contact Info

Product

Resources

About