Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.350
|View full text |Cite
|
Sign up to set email alerts
|

SimulSpeech: End-to-End Simultaneous Speech to Text Translation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
251
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 141 publications
(254 citation statements)
references
References 19 publications
3
251
0
Order By: Relevance
“…We use a simplified version of [6] as our baseline model. While in [6], simultaneous policy on word boundaries which generated by a seperate model, we simply utilize a fixeddecision module introduced by [7]. Our choice is motivated by the fact that in [7], a fixed chunk size gave similar qualitylatency trade-offs as word boundaries.…”
Section: Methodsmentioning
confidence: 99%
“…We use a simplified version of [6] as our baseline model. While in [6], simultaneous policy on word boundaries which generated by a seperate model, we simply utilize a fixeddecision module introduced by [7]. Our choice is motivated by the fact that in [7], a fixed chunk size gave similar qualitylatency trade-offs as word boundaries.…”
Section: Methodsmentioning
confidence: 99%
“…Note that the above-mentioned latency metrics are all proposed for text-to-text simultaneous translation and we use AL in the text track for latency evaluation. Some work extended AP and AL to speech translation (Ren et al, 2020;Ma et al, 2020), but we don't use them because they measure real-time latency, while some submissions calling remote services contain network delay. It is unreasonable to use real-time latency metrics for both the local-running systems and remote-running systems.…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…For example, a pre-defined translation of a named entity can be introduced to the MT module. However, controllability is not easy to be guaranteed for end-to-end simultaneous translation systems (Ren et al, 2020;Ma et al, 2020). It remains a challenge to correct a translation without an intermediate ASR result.…”
Section: Applicationsmentioning
confidence: 99%
“…Simultaneous translation, the task of generating translations before reading the entire text or speech source input, has become an increasingly popular topic for both text and speech translation (Grissom II et al, 2014;Cho and Esipova, 2016;Gu et al, 2017;Alinejad et al, 2018;Arivazhagan et al, 2019;Ma et al, 2020;Ren et al, 2020). Simultaneous models are typically evaluated from quality and latency perspective.…”
Section: Introductionmentioning
confidence: 99%
“…While the translation quality is usually measured by BLEU (Papineni et al, 2002;Post, 2018), a wide variety of latency measurements have been introduced, such as Average Proportion (AP) (Cho and Esipova, 2016), Continues Wait Length (CW) (Gu et al, 2017), Average Lagging (AL) , Differentiable Average Lagging (DAL) (Cherry and Foster, 2019), and so on. Unfortunately, the latency evaluation processes across different works are not consistent: 1) the latency metric definitions are not precise enough with respect to text segmentation; 2) the definitions are also not precise enough with respect to the speech segmentation, for example some models are evaluated on speech segments (Ren et al, 2020) while others are evaluated on time duration (Ansari et al, 2020); 3) little prior work has released implementations of the decoding process and latency measurement. The lack of clarity and consistency of the latency evaluation process makes it challenging to compare different works and prevents tracking the scientific progress of this field.…”
Section: Introductionmentioning
confidence: 99%