Proceedings of the Third Conference on Machine Translation: Research Papers 2018
DOI: 10.18653/v1/w18-6322
|View full text |Cite
|
Sign up to set email alerts
|

Correcting Length Bias in Neural Machine Translation

Abstract: We study two problems in neural machine translation (NMT). First, in beam search, whereas a wider beam should in principle help translation, it often hurts NMT. Second, NMT has a tendency to produce translations that are too short. Here, we argue that these problems are closely related and both rooted in label bias. We show that correcting the brevity problem almost eliminates the beam problem; we compare some commonly-used methods for doing this, finding that a simple per-word reward works well; and we introd… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

4
102
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 120 publications
(106 citation statements)
references
References 17 publications
4
102
0
Order By: Relevance
“…The length ratio is not just about BLEU: if the hypothesis length is only 75% of reference length, something that should have been translated must be missing; i.e., bad adequacy. Indeed,Murray and Chiang (2018) confirm the same phenomenon with METEOR.2 Pre-neural SMT models, being probabilistic, also favor short translations (and derivations), which is addressed by word (and phrase) reward. The crucial difference between SMT and NMT is that the former stops when covering the whole input, while the latter stops on emitting </eos>.…”
mentioning
confidence: 54%
See 1 more Smart Citation
“…The length ratio is not just about BLEU: if the hypothesis length is only 75% of reference length, something that should have been translated must be missing; i.e., bad adequacy. Indeed,Murray and Chiang (2018) confirm the same phenomenon with METEOR.2 Pre-neural SMT models, being probabilistic, also favor short translations (and derivations), which is addressed by word (and phrase) reward. The crucial difference between SMT and NMT is that the former stops when covering the whole input, while the latter stops on emitting </eos>.…”
mentioning
confidence: 54%
“…Murray and Chiang (2018) attribute the fact that beam search prefers shorter candidates to the label bias problem(Lafferty et al, 2001) due to NMT's local normalization.…”
mentioning
confidence: 99%
“…A potential confound is that performance might change with the length of the source in BiLSTMs (Carpuat et al, 2013;Murray and Chiang, 2018), in Transformers it was reported to increase . Length is generally greater in the challenge set than in the full test set, and generally increases with d, showing if anything a decrease of performance by length.…”
Section: Methodsmentioning
confidence: 99%
“…This difficulty has been observed in datasets designed to test the ability of models to compositionally generalize, such as SCAN , where the best performing neural models do not even exceed 20% accuracy on generating sequences of out-of-domain lengths, whereas indomain performance is 100%. Extrapolation has also been a challenge for neural machine translation; Murray and Chiang (2018) identifies models producing translations that are too short as one of the main challenges for neural MT.…”
Section: Related Workmentioning
confidence: 99%