Efficient Wait-k Models for Simultaneous Machine Translation

Elbayad, Maha; Besacier, Laurent; Verbeek, Jakob

doi:10.21437/interspeech.2020-1241

“…In this work we train Transformer (Vaswani et al, 2017) and Pervasive Attention (Elbayad et al, 2018) models for the tasks of online and offline translation. Following Elbayad et al (2020), we use unidrectional encoders and train the online MT models with k train = 7, proven to yield better translations across the latency spectrum. We train our models on IWSLT'14 De )En (German )English) and En )De (English )German) datasets (Cettolo et al, 2014).…”

Section: Methodsmentioning

confidence: 99%

“…Another adjacent research direction enables revision during online translation to alleviate decoding constraints (Niehues et al, 2016;Zheng et al, 2020;Arivazhagan et al, 2020). In this work, we focus on wait-k and greedy decoding strategies, but unlike other wait-k models (Ma et al, 2019;Zheng et al, 2019b;Zheng et al, 2019a) we opt for uni-directional encoders which are efficient to train in an online setup (Elbayad et al, 2020).…”

Section: Related Work 21 Online Nmtmentioning

confidence: 99%

Online Versus Offline NMT Quality: An In-depth Analysis on English-German and German-English

Elbayad¹,

Ustaszewski²,

Esperança-Rodier³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

Self Cite

View full text Add to dashboard Cite

We conduct in this work an evaluation study comparing offline and online neural machine translation architectures. Two sequence-to-sequence models: convolutional Pervasive Attention (Elbayad et al., 2018) and attention-based Transformer (Vaswani et al., 2017) are considered. We investigate, for both architectures, the impact of online decoding constraints on the translation quality through a carefully designed human evaluation on English-German and German-English language pairs, the latter being particularly sensitive to latency constraints. The evaluation results allow us to identify the strengths and shortcomings of each model when we shift to the online setup.

show abstract

“…First, we pre-train an ST model on the general domain MT corpus, and then fine-tune the ST model on the spoken language domain corpus. For pre-training, we apply multi-path (Elbayad et al, 2020) and future-guided (Zhang et al, 2020b), to enhance the predict ability and avoid the huge consumption caused by training different models for each k. For fine-tuning, we apply the original prefix-to-prefix framework (Ma et al, 2019).…”

Section: Training Methodsmentioning

confidence: 99%

“…added a Predict operation to the agent based on Gu et al (2017), predicting the next word as an additional input. Elbayad et al (2020) enhances the wait-k policy by sampling different k to train. Zhang et al (2020b) proposed future-guided training, which introduces a full-sentence Transformer as the teacher of the ST model and uses future information to guide training through knowledge distillation.…”

Section: Related Workmentioning

confidence: 99%

ICT’s System for AutoSimTrans 2021: Robust Char-Level Simultaneous Translation

Zhang¹,

Feng²

2021

Proceedings of the Second Workshop on Automatic Simultaneous Translation

View full text Add to dashboard Cite

Simultaneous translation (ST) outputs the translation simultaneously while reading the input sentence, which is an important component of simultaneous interpretation. In this paper, we describe our submitted ST system, which won the first place in the streaming transcription input track of the Chinese-English translation task of AutoSimTrans 2021. Aiming at the robustness of ST, we first propose char-level simultaneous translation and applied wait-k policy on it. Meanwhile, we apply two data processing methods and combine two training methods for domain adaptation. Our method enhance the ST model with stronger robustness and domain adaptability. Experiments on streaming transcription show that our method outperforms the baseline at all latency, especially at low latency, the proposed method improves about 6 BLEU. Besides, ablation studies we conduct verify the effectiveness of each module in the proposed method.

show abstract

“…First, we pre-train an ST model on the general domain MT corpus, and then fine-tune the ST model on the spoken language domain corpus. For pre-training, we apply multi-path (Elbayad et al, 2020) and future-guided (Zhang et al, 2020b), to enhance the predict ability and avoid the huge consumption caused by training different models for each k. For fine-tuning, we apply the original prefix-to-prefix framework .…”

Section: Training Methodsmentioning

confidence: 99%

Proceedings of the Second Workshop on Automatic Simultaneous Translation

2021

0

View full text Add to dashboard Cite

Simultaneous translation (ST) outputs the translation simultaneously while reading the input sentence, which is an important component of simultaneous interpretation. In this paper, we describe our submitted ST system, which won the first place in the streaming transcription input track of the Chinese-English translation task of AutoSimTrans 2021. Aiming at the robustness of ST, we first propose char-level simultaneous translation and applied wait-k policy on it. Meanwhile, we apply two data processing methods and combine two training methods for domain adaptation. Our method enhance the ST model with stronger robustness and domain adaptability. Experiments on streaming transcription show that our method outperforms the baseline at all latency, especially at low latency, the proposed method improves about 6 BLEU. Besides, ablation studies we conduct verify the effectiveness of each module in the proposed method.

show abstract

Efficient Wait-k Models for Simultaneous Machine Translation

Cited by 56 publications

References 14 publications

Online Versus Offline NMT Quality: An In-depth Analysis on English-German and German-English

Online Versus Offline NMT Quality: An In-depth Analysis on English-German and German-English

ICT’s System for AutoSimTrans 2021: Robust Char-Level Simultaneous Translation

Proceedings of the Second Workshop on Automatic Simultaneous Translation

Contact Info

Product

Resources

About