Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1126
|View full text |Cite
|
Sign up to set email alerts
|

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation

Abstract: Simultaneous machine translation begins to translate each source sentence before the source speaker is finished speaking, with applications to live and streaming scenarios. Simultaneous systems must carefully schedule their reading of the source sentence to balance quality against latency. We present the first simultaneous translation system to learn an adaptive schedule jointly with a neural machine translation (NMT) model that attends over all source tokens read thus far. We do so by introducing Monotonic In… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
232
0

Year Published

2019
2019
2025
2025

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 147 publications
(234 citation statements)
references
References 16 publications
2
232
0
Order By: Relevance
“…For the MoChA model we use chunk size of 8. In the MILK model we applied a latency loss different from the the one proposed in [17], as the original latency loss is tailored for machine translation where the source and target sequence have similar length. Our latency loss minimize the root-mean-square value of the interval between two consecutive emissions: Table 1: Word-error-rates of end-to-end models on YouTube test set.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…For the MoChA model we use chunk size of 8. In the MILK model we applied a latency loss different from the the one proposed in [17], as the original latency loss is tailored for machine translation where the source and target sequence have similar length. Our latency loss minimize the root-mean-square value of the interval between two consecutive emissions: Table 1: Word-error-rates of end-to-end models on YouTube test set.…”
Section: Methodsmentioning
confidence: 99%
“…This fixed window size may still limit the full potential of the attention mechanism. The monotonic infinite lookback attention (MILK) mechanism was proposed in [17] to allow the attention window to look back all the way to the beginning of the sequence.…”
Section: Monotonic Infinite Lookback Attentionmentioning
confidence: 99%
“…With this refined g, we can make several latency metrics content-aware, including average proportion (Cho and Esipova, 2016), consecutive wait (Gu et al, 2017), average lagging , and differentiable average lagging (Arivazhagan et al, 2019b). We opt for differentiable average lagging (DAL) because of its interpretability and because it sidesteps some problems with average lagging (Cherry and Foster, 2019).…”
Section: Latencymentioning
confidence: 99%
“…We employ their wait-k training as a baseline, and use their wait-k inference to improve re-translation. Our second and strongest streaming baseline is the MILk approach of Arivazhagan et al (2019b), who improve upon wait-k training with a hierarchical attention that can adapt how it will wait based on the current context. Both wait-k training and MILk attention provide hyper-parameters to control their quality-latency trade-offs: k for wait-k, and latency weight for MILk.…”
Section: Introductionmentioning
confidence: 99%
“…When the source sentence ends, the decoder can do a tail beam search on the remaining target words, but beam search is seemingly impossible before the source sentence ends. policy model sequence-to-sequence prefix-to-prefix (full-sentence model) (simultaneous model) fixedlatency test-time wait-k (Dalvi et al, 2018; wait-k adaptive RL MILk (Gu et al, 2017) (Arivazhagan et al, 2019) Supervised Learning Imitation Learning ) 2. The second method learns an adaptive policy which uses either supervised or reinforcement learning (Grissom II et al, 2014;Gu et al, 2017) to decide whether to READ (the next source word) or WRITE (the next target word) .…”
Section: Simultaneous Mt: Policies and Modelsmentioning
confidence: 99%