Improving Beam Search by Removing Monotonic Constraint for Neural Machine Translation

Shu, Raphael; Nakayama, Hideki

doi:10.18653/v1/p18-2054

Cited by 11 publications

(7 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This corresponds to searching with a non-admissible heuristic (Hart et al, 1968), that is, a heuristic that may underestimate the likelihood of completing a translation. This biased search affects statistics of beam search outputs in unknown ways and may well account for some of the pathologies of Section 2, and has motivated variants of the algorithm aimed at comparing partial translations more fairly (Huang et al, 2017;Shu and Nakayama, 2018). This problem has also been studied in parsing literature, where it's known as imbalanced probability search bias (Stanojević and Steedman, 2020).…”

Section: Nmt and Its Many Biasesmentioning

confidence: 99%

Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

Eikema

Aziz

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

Recent studies have revealed a number of pathologies of neural machine translation (NMT) systems. Hypotheses explaining these mostly suggest there is something fundamentally wrong with NMT as a model or its training algorithm, maximum likelihood estimation (MLE). Most of this evidence was gathered using maximum a posteriori (MAP) decoding, a decision rule aimed at identifying the highest-scoring translation, i.e. the mode. We argue that the evidence corroborates the inadequacy of MAP decoding more than casts doubt on the model and its training algorithm. In this work, we show that translation distributions do reproduce various statistics of the data well, but that beam search strays from such statistics. We show that some of the known pathologies and biases of NMT are due to MAP decoding and not to NMT's statistical assumptions nor MLE. In particular, we show that the most likely translations under the model accumulate so little probability mass that the mode can be considered essentially arbitrary. We therefore advocate for the use of decision rules that take into account the translation distribution holistically. We show that an approximation to minimum Bayes risk decoding gives competitive results confirming that NMT models do capture important aspects of translation well in expectation.

show abstract

Section: Nmt and Its Many Biasesmentioning

confidence: 99%

Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

Eikema

Aziz

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Previous non-monotonic methods (Serdyuk et al, 2018;Zhang et al, 2018;Zhou et al, 2019a,b;Zhang et al, 2019;Welleck et al, 2019) jointly leverage L2R and R2L information. Non-monotonic methods are also widely used in many tasks (Huang et al, 2018;Shu and Nakayama, 2018), such as parsing (Goldberg and Elhadad, 2010), image caption (Mehri and Sigal, 2018), and dependency parsing (Kiperwasser and Goldberg, 2016;. Similarly, insertion-based method (Gu et al, 2019;Stern et al, 2019) predicts the next token and its position to be inserted.…”

Section: Related Workmentioning

confidence: 99%

Smart-Start Decoding for Neural Machine Translation

Yang¹,

Ma²,

Zhang³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Most current neural machine translation models adopt a monotonic decoding order of either left-to-right or right-to-left. In this work, we propose a novel method that breaks up the limitation of these decoding orders, called Smart-Start decoding. More specifically, our method first predicts a median word. It starts to decode the words on the right side of the median word and then generates words on the left. We evaluate the proposed Smart-Start decoding method on three datasets. Experimental results show that the proposed method can significantly outperform strong baseline models.

show abstract

“…Naive beam search with log-probabilities has several known drawbacks, for example, favoring short translations and its monotonic constraint. Hence, many regularization/rescoring methods [2,27,8,28,16] or beam search variants [6,21] are proposed to improve the performance of beam search. Other than beam search, one promising MAP decoding technique for evaluation is the DFS-based exact search [22], which is designed to find the mode of model distributions.…”

Section: Related Workmentioning

confidence: 99%

Rethinking the Evaluation of Neural Machine Translation

Yan¹,

Wu²,

Meng³

et al. 2021

Preprint

View full text Add to dashboard Cite

The evaluation of neural machine translation systems is usually built upon generated translation of a certain decoding method (e.g., beam search) with evaluation metrics over the generated translation (e.g., BLEU). However, this evaluation framework suffers from high search errors brought by heuristic search algorithms and is limited by its nature of evaluation over one best candidate. In this paper, we propose a novel evaluation protocol, which not only avoids the effect of search errors but provides a system-level evaluation in the perspective of model ranking. In particular, our method is based on our newly proposed exact top-k decoding instead of beam search. Our approach evaluates model errors by the distance between the candidate spaces scored by the references and the model respectively. Extensive experiments on WMT'14 English-German demonstrate that bad ranking ability is connected to the well-known beam search curse, and state-of-the-art Transformer models are facing serious ranking errors. By evaluating various model architectures and techniques, we provide several interesting findings. Finally, to effectively approximate the exact search algorithm with same time cost as original beam search, we present a minimum heap augmented beam search algorithm. * Equal contribution.Preprint. Under review.

show abstract

Improving Beam Search by Removing Monotonic Constraint for Neural Machine Translation

Cited by 11 publications

References 8 publications

Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

Smart-Start Decoding for Neural Machine Translation

Rethinking the Evaluation of Neural Machine Translation

Contact Info

Product

Resources

About