Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.689
|View full text |Cite
|
Sign up to set email alerts
|

Learning a Multi-Domain Curriculum for Neural Machine Translation

Abstract: Most data selection research in machine translation focuses on improving a single domain. We perform data selection for multiple domains at once. This is achieved by carefully introducing instance-level domain-relevance features and automatically constructing a training curriculum to gradually concentrate on multi-domain relevant and noise-reduced data batches. Both the choice of features and the use of curriculum are crucial for balancing and improving all domains, including out-ofdomain. In large-scale exper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 24 publications
(12 citation statements)
references
References 35 publications
0
12
0
Order By: Relevance
“…IV-C, the automatic CL methodologies in "RL Teacher" and "Other Automatic CL" (mostly) categories could learn to automatically and dynamically select the most suitable examples or tasks (with adjustable loss weights) for the current training step. Interestingly, in some of the works, the best curriculum found by the algorithm is the opposite of traditional CL, i.e., "hard to easy" [15], [96] or "starting big" (from full dataset to informative subset) [96], [97], [112]. A discussion on this seemingly paradoxical phenomenon will be made in Sec.…”
Section: Definition Of CLmentioning
confidence: 99%
“…IV-C, the automatic CL methodologies in "RL Teacher" and "Other Automatic CL" (mostly) categories could learn to automatically and dynamically select the most suitable examples or tasks (with adjustable loss weights) for the current training step. Interestingly, in some of the works, the best curriculum found by the algorithm is the opposite of traditional CL, i.e., "hard to easy" [15], [96] or "starting big" (from full dataset to informative subset) [96], [97], [112]. A discussion on this seemingly paradoxical phenomenon will be made in Sec.…”
Section: Definition Of CLmentioning
confidence: 99%
“…We learn the curriculum using Bayesian Optimization (BO) for which we use an open source implementation 4 . Similar work has been proposed for transfer learning (Ruder and Plank, 2017) and NMT (Wang et al, 2020). As we already have a reasonably trained NMT model, we use it to compute instance-level features for learning the curriculum.…”
Section: Curriculum Learningmentioning
confidence: 99%
“…Recently, CL has been widely employed in the machine learning for NLP. It improves the performance and the training efficiency of the NMT models based on linguistic features Wang et al, 2020a), enhances the multi-domain correlation, and addresses the domain imbalance issue (Wang et al, 2020b). It also has been explored in other tasks, such as response generation (Shen and Feng, 2020) and reading comprehension (Tay et al, 2019).…”
Section: Related Workmentioning
confidence: 99%