Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation

Yang, Y.; Huang, Liang; Ma, Mingbo

doi:10.18653/v1/d18-1342

Cited by 88 publications

(64 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It has previously been pointed out that NMT performance suffers from a beam search size beyond 5 or 10 ( Koehn and Knowles, 2017;Tu et al, 2017) and numerous methods have been proposed to circumvent this (Huang et al, 2017;Yang et al, 2018). However, for our present way of dubbing-optimization based on N-best rescoring, high beam sizes are essential for the dubbingrescoring described in Algorithm 1 to have some material to work with.…”

Section: Experiments and Resultsmentioning

confidence: 99%

Integration of Dubbing Constraints into Machine Translation

Saboo¹,

Baumann²

2019

Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

View full text Add to dashboard Cite

Translation systems aim to perform a meaningpreserving conversion of linguistic material (typically text but also speech) from a source to a target language (and, to a lesser degree, the corresponding socio-cultural contexts). Dubbing, i. e., the lip-synchronous translation and revoicing of speech adds to this constraints about the close matching of phonetic and resulting visemic synchrony characteristics of source and target material. There is an inherent conflict between a translation's meaning preservation and its 'dubbability' and the resulting trade-off can be controlled by weighing the synchrony constraints. We introduce our work, which to the best of our knowledge is the first of its kind, on integrating synchrony constraints into the machine translation paradigm. We present first results for the integration of synchrony constraints into encoder decoder-based neural machine translation and show that considerably more 'dubbable' translations can be achieved with only a small impact on BLEU score, and dubbability improves more steeply than BLEU degrades. * *This work was performed during an internship at Universität Hamburg, Germany.

show abstract

Section: Experiments and Resultsmentioning

confidence: 99%

Integration of Dubbing Constraints into Machine Translation

Saboo¹,

Baumann²

2019

Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

View full text Add to dashboard Cite

show abstract

“…However, input and output sentences generally have different lengths. In some extreme directions such as Chinese to English, the target side is significantly longer than the source side, with an average gold tgt/src ratio, r = |y |/|x|, of around 1.25 (Huang et al, 2017;Yang et al, 2018). In this case, if we still follow the vanilla wait-k policy, the tail length will be 0.25|x| + k which increases with input length.…”

Section: Discussionmentioning

confidence: 99%

STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework

Ma¹,

Huang²,

Xiong³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

167

351

View full text Add to dashboard Cite

Simultaneous translation, which translates sentences before they are finished, is useful in many scenarios but is notoriously difficult due to word-order differences. While the conventional seq-to-seq framework is only suitable for full-sentence translation, we propose a novel prefix-to-prefix framework for simultaneous translation that implicitly learns to anticipate in a single translation model. Within this framework, we present a very simple yet surprisingly effective "wait-k" policy trained to generate the target sentence concurrently with the source sentence, but always k words behind. Experiments show our strategy achieves low latency and reasonable quality (compared to full-sentence translation) on 4 directions: zh↔en and de↔en. * M.M. and L.H. contributed equally; L.H. conceived the main ideas (prefix-to-prefix and wait-k) and directed the project, while M.M. led the implementations on RNN and Transformer. See example videos, media reports, code, and data at https://simultrans-demo.github.io/. President Bush met with Putin in MoscowBùshí Bush zǒngtǒng President zài at Mòsīkē Moscow yǔ with Pǔjīng Putin huìwù meet prediction read write Source side → Target side → 2 Preliminaries: Full-Sentence NMT We first briefly review standard (full-sentence) neural translation to set up the notations.Regardless of the particular design of different seq-to-seq models, the encoder always takes

show abstract

“…This inductive bias appears to be paramount for generating desirable text from neural probabilistic text generators. While several works explore this phenomenon (Murray and Chiang, 2018;Yang et al, 2018;Stahlberg and Byrne, 2019;Cohen and Beck, 2019), no one has yet hypothesized what beam search's hidden inductive bias may be. Our work fills this gap.…”

Section: Introductionmentioning

confidence: 99%

If beam search is the answer, what was the question?

Meister¹,

Cotterell²,

Vieira³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Quite surprisingly, exact maximum a posteriori (MAP) decoding of neural language generators frequently leads to low-quality results (Stahlberg and Byrne, 2019). Rather, most state-of-the-art results on language generation tasks are attained using beam search despite its overwhelmingly high search error rate. This implies that the MAP objective alone does not express the properties we desire in text, which merits the question: if beam search is the answer, what was the question? We frame beam search as the exact solution to a different decoding objective in order to gain insights into why high probability under a model alone may not indicate adequacy. We find that beam search enforces uniform information density in text, a property motivated by cognitive science. We suggest a set of decoding objectives that explicitly enforce this property and find that exact decoding with these objectives alleviates the problems encountered when decoding poorly calibrated language generation models. Additionally, we analyze the text produced using various decoding strategies and see that, in our neural machine translation experiments, the extent to which this property is adhered to strongly correlates with BLEU.

show abstract

Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation

Cited by 88 publications

References 16 publications

Integration of Dubbing Constraints into Machine Translation

Integration of Dubbing Constraints into Machine Translation

STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework

If beam search is the answer, what was the question?

Contact Info

Product

Resources

About