Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-short.65
|View full text |Cite
|
Sign up to set email alerts
|

Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation

Abstract: Recently, token-level adaptive training has achieved promising improvement in machine translation, where the cross-entropy loss function is adjusted by assigning different training weights to different tokens, in order to alleviate the token imbalance problem. However, previous approaches only use static word frequency information in the target language without considering the source language, which is insufficient for bilingual tasks like machine translation. In this paper, we propose a novel bilingual mutual… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(20 citation statements)
references
References 15 publications
0
20
0
Order By: Relevance
“…Mutual information (MI) is a general metric in information theory (Shannon, 1948), which measures the mutual dependence between two random variables a and b as follows 4 : Xu et al (2021) propose token-level bilingual mutual information (BMI) to measure the word mapping diversity between bilinguals and further conduct BMI-based adaptive training for NMT. The BMI is formulated as:…”
Section: Mutual Information For Nmtmentioning
confidence: 99%
See 4 more Smart Citations
“…Mutual information (MI) is a general metric in information theory (Shannon, 1948), which measures the mutual dependence between two random variables a and b as follows 4 : Xu et al (2021) propose token-level bilingual mutual information (BMI) to measure the word mapping diversity between bilinguals and further conduct BMI-based adaptive training for NMT. The BMI is formulated as:…”
Section: Mutual Information For Nmtmentioning
confidence: 99%
“…Transformer base (Vaswani et al, 2017) † 27.30 -Transformer base (Vaswani et al, 2017) 28.10 25.36 + Freq-Exponential (Gu et al, 2020) 28.43 (+0.33) 24.99 (-0.37) + Freq-Chi-Square (Gu et al, 2020) 28.47 (+0.37) 25.43 (+0.07) + BMI-adaptive (Xu et al, 2021) 28.56 (+0.45) 25.77 (+0.41) + Focal Loss (Lin et al, 2017) 28.43 (+0.33) 25.37 (+0.01) + Anti-Focal Loss (Raunak et al, 2020) 28.65 (+0.55) 25.50 (+0.14) + Self-Paced Learning (Wan et al, 2020) 28.69 (+0.59) 25.75 (+0.39) + Simple Fusion (Stahlberg et al, 2018) 27.82 (-0.28) 23.91 (-1.45) + LM Prior (Baziotis et al, 2020) 28 (Vaswani et al, 2017) 29.31 25.48 + Freq-Exponential (Gu et al, 2020) 29.66 (+0.35) 25.57 (+0.09) + Freq-Chi-Square (Gu et al, 2020) 29.64 (+0.33) 25.64 (+0.14) + BMI-adaptive (Xu et al, 2021) 29.69 (+0.38) 25.81 (+0.33) + Focal Loss (Lin et al, 2017) 29.65 (+0.34) 25.54 (+0.06) + Anti-Focal Loss (Raunak et al, 2020) 29.72 (+0.41) 25.64 (+0.16) + Self-Paced Learning (Wan et al, 2020) 29 9) and ( 12). we fix scale s to 0.3 and tune scale t in a similar way.…”
Section: Model Wmt14 En→de Wmt19 Zh→enmentioning
confidence: 99%
See 3 more Smart Citations