Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1342
|View full text |Cite
|
Sign up to set email alerts
|

Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation

Abstract: Beam search is widely used in neural machine translation, and usually improves translation quality compared to greedy search. It has been widely observed that, however, beam sizes larger than 5 hurt translation quality. We explain why this happens, and propose several methods to address this problem. Furthermore, we discuss the optimal stopping criteria for these methods. Results show that our hyperparameter-free methods outperform the widely-used hyperparameter-free heuristic of length normalization by +2.0 B… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
63
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
2
2

Relationship

2
8

Authors

Journals

citations
Cited by 88 publications
(64 citation statements)
references
References 16 publications
1
63
0
Order By: Relevance
“…It has previously been pointed out that NMT performance suffers from a beam search size beyond 5 or 10 ( Koehn and Knowles, 2017;Tu et al, 2017) and numerous methods have been proposed to circumvent this (Huang et al, 2017;Yang et al, 2018). However, for our present way of dubbing-optimization based on N-best rescoring, high beam sizes are essential for the dubbingrescoring described in Algorithm 1 to have some material to work with.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…It has previously been pointed out that NMT performance suffers from a beam search size beyond 5 or 10 ( Koehn and Knowles, 2017;Tu et al, 2017) and numerous methods have been proposed to circumvent this (Huang et al, 2017;Yang et al, 2018). However, for our present way of dubbing-optimization based on N-best rescoring, high beam sizes are essential for the dubbingrescoring described in Algorithm 1 to have some material to work with.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…However, input and output sentences generally have different lengths. In some extreme directions such as Chinese to English, the target side is significantly longer than the source side, with an average gold tgt/src ratio, r = |y |/|x|, of around 1.25 (Huang et al, 2017;Yang et al, 2018). In this case, if we still follow the vanilla wait-k policy, the tail length will be 0.25|x| + k which increases with input length.…”
Section: Discussionmentioning
confidence: 99%
“…This inductive bias appears to be paramount for generating desirable text from neural probabilistic text generators. While several works explore this phenomenon (Murray and Chiang, 2018;Yang et al, 2018;Stahlberg and Byrne, 2019;Cohen and Beck, 2019), no one has yet hypothesized what beam search's hidden inductive bias may be. Our work fills this gap.…”
Section: Introductionmentioning
confidence: 99%