Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1241
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Wait-k Models for Simultaneous Machine Translation

Abstract: Simultaneous machine translation consists in starting output generation before the entire input sequence is available. Wait-k decoders offer a simple but efficient approach for this problem. They first read k source tokens, after which they alternate between producing a target token and reading another source token. We investigate the behavior of wait-k decoding in low resource settings for spoken corpora using IWSLT datasets. We improve training of these models using unidirectional encoders, and training acro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

2
68
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 56 publications
(70 citation statements)
references
References 14 publications
2
68
0
Order By: Relevance
“…In this work we train Transformer (Vaswani et al, 2017) and Pervasive Attention (Elbayad et al, 2018) models for the tasks of online and offline translation. Following Elbayad et al (2020), we use unidrectional encoders and train the online MT models with k train = 7, proven to yield better translations across the latency spectrum. We train our models on IWSLT'14 De )En (German )English) and En )De (English )German) datasets (Cettolo et al, 2014).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In this work we train Transformer (Vaswani et al, 2017) and Pervasive Attention (Elbayad et al, 2018) models for the tasks of online and offline translation. Following Elbayad et al (2020), we use unidrectional encoders and train the online MT models with k train = 7, proven to yield better translations across the latency spectrum. We train our models on IWSLT'14 De )En (German )English) and En )De (English )German) datasets (Cettolo et al, 2014).…”
Section: Methodsmentioning
confidence: 99%
“…Another adjacent research direction enables revision during online translation to alleviate decoding constraints (Niehues et al, 2016;Zheng et al, 2020;Arivazhagan et al, 2020). In this work, we focus on wait-k and greedy decoding strategies, but unlike other wait-k models (Ma et al, 2019;Zheng et al, 2019b;Zheng et al, 2019a) we opt for uni-directional encoders which are efficient to train in an online setup (Elbayad et al, 2020).…”
Section: Related Work 21 Online Nmtmentioning
confidence: 99%
“…First, we pre-train an ST model on the general domain MT corpus, and then fine-tune the ST model on the spoken language domain corpus. For pre-training, we apply multi-path (Elbayad et al, 2020) and future-guided (Zhang et al, 2020b), to enhance the predict ability and avoid the huge consumption caused by training different models for each k. For fine-tuning, we apply the original prefix-to-prefix framework (Ma et al, 2019).…”
Section: Training Methodsmentioning
confidence: 99%
“…added a Predict operation to the agent based on Gu et al (2017), predicting the next word as an additional input. Elbayad et al (2020) enhances the wait-k policy by sampling different k to train. Zhang et al (2020b) proposed future-guided training, which introduces a full-sentence Transformer as the teacher of the ST model and uses future information to guide training through knowledge distillation.…”
Section: Related Workmentioning
confidence: 99%
“…First, we pre-train an ST model on the general domain MT corpus, and then fine-tune the ST model on the spoken language domain corpus. For pre-training, we apply multi-path (Elbayad et al, 2020) and future-guided (Zhang et al, 2020b), to enhance the predict ability and avoid the huge consumption caused by training different models for each k. For fine-tuning, we apply the original prefix-to-prefix framework .…”
Section: Training Methodsmentioning
confidence: 99%