Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.76
|View full text |Cite
|
Sign up to set email alerts
|

Token-level Adaptive Training for Neural Machine Translation

Abstract: There exists a token imbalance phenomenon in natural language as different tokens appear with different frequencies, which leads to different learning difficulties for tokens in Neural Machine Translation (NMT). The vanilla NMT model usually adopts trivial equal-weighted objectives for target tokens with different frequencies and tends to generate more high-frequency tokens and less lowfrequency tokens compared with the golden token distribution. However, low-frequency tokens may carry critical semantic inform… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
29
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 22 publications
(30 citation statements)
references
References 30 publications
1
29
0
Order By: Relevance
“…We use a strong baseline system in this work in order to make the evaluation convincing. Improvement of existing methods (Gu et al, 2020) (Koehn, 2004) by contrast to all other models (p<0.01). Zh-En task, respectively.…”
Section: Resultsmentioning
confidence: 94%
See 2 more Smart Citations
“…We use a strong baseline system in this work in order to make the evaluation convincing. Improvement of existing methods (Gu et al, 2020) (Koehn, 2004) by contrast to all other models (p<0.01). Zh-En task, respectively.…”
Section: Resultsmentioning
confidence: 94%
“…Exponential (Gu et al, 2020). This method adds an additional training weights to lowfrequency target tokens:…”
Section: Systemsmentioning
confidence: 99%
See 1 more Smart Citation
“…Wang and Sennrich (2020) links the exposure bias problem (Ranzato et al, 2016;Shao et al, 2018;Zhang et al, 2019) to the phenomenon of NMT tends to generate hallucinations under domain shift. Gu et al (2020) finds that the NMT tends to generate more high-frequency tokens and less low-frequency tokens than reference. Compared with them, this work mainly focuses on investigating the functions of the different modules and parameters in the NMT model during continual training.…”
Section: Related Workmentioning
confidence: 94%
“…where l ( x n i , y n i , t) stands for the cross-entropy loss of the ith token of the example x n , y n of the tth training step. We ensure all weights to be larger than 1 to ensure the gradient norm during backpropagation (Gu et al, 2020).…”
Section: Data Weightmentioning
confidence: 99%