ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414276
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Study of End-To-End Simultaneous Speech Translation Decoding Strategies

Abstract: This paper proposes a decoding strategy for end-to-end simultaneous speech translation. We leverage end-to-end models trained in offline mode and conduct an empirical study for two language pairs (English-to-German and English-to-Portuguese). We also investigate different output token granularities including characters and Byte Pair Encoding (BPE) units. The results show that the proposed decoding approach allows to control BLEU/Average Lagging trade-off along different latency regimes. Our best decoding setti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(17 citation statements)
references
References 12 publications
0
17
0
Order By: Relevance
“…Also Zeng et al (2021) integrate the beam search in the decoding strategy, developing the wait-k-stride-N strategy. In particular, the authors bypass output speculation by directly applying beam search, after waiting for k words, on a word stride of size N (i.e., on N words at a time) instead of one single word as prescribed by the standard wait-k. Nguyen et al (2021a) analyzed several decoding strategies relying on different output token granularities, such as characters and Byte Pair Encoding (BPE), showing that the latter yields lower latency.…”
Section: Encodingmentioning
confidence: 99%
See 2 more Smart Citations
“…Also Zeng et al (2021) integrate the beam search in the decoding strategy, developing the wait-k-stride-N strategy. In particular, the authors bypass output speculation by directly applying beam search, after waiting for k words, on a word stride of size N (i.e., on N words at a time) instead of one single word as prescribed by the standard wait-k. Nguyen et al (2021a) analyzed several decoding strategies relying on different output token granularities, such as characters and Byte Pair Encoding (BPE), showing that the latter yields lower latency.…”
Section: Encodingmentioning
confidence: 99%
“…An alternative approach to simultaneous training is the offline (or full-sentence) training of the system and its subsequent use as a simultaneous one. Nguyen et al (2021a) explored this solution with an LSTM-based direct ST system, analyzing the effectiveness of different decoding strategies. Interestingly, the offline approach does not only preserve overall performance despite the switch of modality, it also improves system's ability to generate well-formed sentences.…”
Section: Encodingmentioning
confidence: 99%
See 1 more Smart Citation
“…The benefits of training a system in similar conditions to the inference setting have been given for granted so far. Although in the literature there are works employing models trained in offline, this has always been motivated by computational limits (Nguyen et al, 2021;. Being aware of the social and environmental impact caused by the high computational costs of the SimulST models (Schwartz et al, 2020), in this work we question this standard approach and ask: Does simultaneous speech translation actually need a simultaneoustrained model?…”
Section: Introductionmentioning
confidence: 99%
“…Most of the previous work used fixed policies. Some of them take fixed-length policy (Nguyen et al, 2021;Ma et al, 2020b that splits speech at a fixed frequency, for example, to generate one target word every T s ms (Figure 1 (a)). Other work adopts word-based policy that splits the speech into words and generates one target word whenever a new source word is detected, which calls for an auxiliary source word detector (Ren et al, 2020;Elbayad et al, 2020;Ma et al, 2020b;Zeng et al, 2021;Chen et al, 2021), see Figure 1 (b).…”
Section: Introductionmentioning
confidence: 99%