Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.542
|View full text |Cite
|
Sign up to set email alerts
|

Learning Adaptive Segmentation Policy for End-to-End Simultaneous Translation

Abstract: End-to-end simultaneous speech-to-text translation aims to directly perform translation from streaming source speech to target text with high translation quality and low latency. A typical simultaneous translation (ST) system consists of a speech translation model and a policy module, which determines when to wait and when to translate. Thus the policy is crucial to balance translation quality and latency. Conventional methods usually adopt fixed policies, e.g. segmenting the source speech with a fixed length … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 28 publications
0
7
0
Order By: Relevance
“…This approach detects whether the translation of a sequence of source tokens forms a prefix of the full sentence's translation. This method was further generalized to speech translation in MU-ST (Zhang et al, 2022a).…”
Section: Related Workmentioning
confidence: 99%
“…This approach detects whether the translation of a sequence of source tokens forms a prefix of the full sentence's translation. This method was further generalized to speech translation in MU-ST (Zhang et al, 2022a).…”
Section: Related Workmentioning
confidence: 99%
“…MU-ST Segmentation based on the meaning unit (Zhang et al, 2022a), which trains an external segmentation model based on the constructed data, and uses it to decide when to translate.…”
Section: System Settingsmentioning
confidence: 99%
“…End-to-end simultaneous speech translation (SimulST) (Fügen et al, 2007;Oda et al, 2014;Ren et al, 2020;Zeng et al, 2021;Zhang et al, 2022a) outputs translation when receiving the streaming speech inputs, and is widely used in realtime scenarios such as international conferences, live broadcasts and real-time subtitles. Compared with the offline speech translation waiting for the complete speech inputs Wang et al, 2020), SimulST needs to segment the streaming speech inputs and synchronously translate based on the current received speech, aiming to achieve high translation quality under low latency (Hamon et al, 2009;Cho and Esipova, 2016;Ma et al, 2020b;Zhang and Feng, 2022c).…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations