Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.661
|View full text |Cite
|
Sign up to set email alerts
|

Speech Translation and the End-to-End Promise: Taking Stock of Where We Are

Abstract: Over its three decade history, speech translation has experienced several shifts in its primary research themes; moving from loosely coupled cascades of speech recognition and machine translation, to exploring questions of tight coupling, and finally to end-to-end models that have recently attracted much attention. This paper provides a brief survey of these developments, along with a discussion of the main challenges of traditional approaches which stem from committing to intermediate representations from the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
55
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 63 publications
(56 citation statements)
references
References 70 publications
(67 reference statements)
1
55
0
Order By: Relevance
“…Similarly, Bahar et al (2021) use 2300 hours of ASR data and 27M sentences of MT data for pretraining. Our competitive performance without the use of any additional data highlights the data-efficient nature of our proposed end-to-end framework as opposed to the baseline encoder-decoder model, as pointed out by Sperber and Paulik (2020).…”
Section: Extending To Must-c Language Pairsmentioning
confidence: 79%
“…Similarly, Bahar et al (2021) use 2300 hours of ASR data and 27M sentences of MT data for pretraining. Our competitive performance without the use of any additional data highlights the data-efficient nature of our proposed end-to-end framework as opposed to the baseline encoder-decoder model, as pointed out by Sperber and Paulik (2020).…”
Section: Extending To Must-c Language Pairsmentioning
confidence: 79%
“…While the traditional cascading approach to automating free translations (using two models, ASR and MT) shows strong results on many datasets, recent works have also shown competitive results using end-to-end systems that directly output translations from speech using a single model (Jan et al, 2019;Sperber and Paulik, 2020;Ansari et al, 2020). For low-resource settings, in particular, the data efficiencies of different methodologies become key performance factors (Bansal et al, 2018;Sperber et al, 2019).…”
Section: Speech Translation (St)mentioning
confidence: 99%
“…Automatic Dubbing (AD) is the task of automatically replacing the speech in a video document with speech in a different language, while preserving as much as possibly the user experience of the original video. AD dubbing differs from speech translation [1,2,3,4] in significant ways. In speech translation, a speech utterance in the source language is recognized, translated (and possibly synthesized) in the target language.…”
Section: Introductionmentioning
confidence: 97%