Proceedings of the 17th International Conference on Spoken Language Translation 2020
DOI: 10.18653/v1/2020.iwslt-1.30
|View full text |Cite
|
Sign up to set email alerts
|

Adapting End-to-End Speech Recognition for Readable Subtitles

Abstract: Automatic speech recognition (ASR) systems are primarily evaluated on transcription accuracy. However, in some use cases such as subtitling, verbatim transcription would reduce output readability given limited screen size and reading time. Therefore, this work focuses on ASR with output compression, a task challenging for supervised approaches due to the scarcity of training data. We first investigate a cascaded system, where an unsupervised compression model is used to post-edit the transcribed speech. We the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 21 publications
0
7
0
Order By: Relevance
“…Using subwords (Sennrich et al, 2016) as the output unit of the decoder is more common in state-of-the-art systems (Akhbardeh et al, 2021). In this case, one can either encode the target length in terms of number of subword tokens (Liu et al, 2020;Niehues, 2020;Buet and Yvon, 2021), or keep the character-level encoding which however requires subtracting the number of characters in the predicted subword token in each decoding step (Lakew et al, 2019). The former has the disadvantage that the number of subword tokens is a less direct measure of translation length, especially for the case of the IWSLT Isometric SLT task where length compliance is measured in terms of number of characters.…”
Section: Length Encodingmentioning
confidence: 99%
“…Using subwords (Sennrich et al, 2016) as the output unit of the decoder is more common in state-of-the-art systems (Akhbardeh et al, 2021). In this case, one can either encode the target length in terms of number of subword tokens (Liu et al, 2020;Niehues, 2020;Buet and Yvon, 2021), or keep the character-level encoding which however requires subtracting the number of characters in the predicted subword token in each decoding step (Lakew et al, 2019). The former has the disadvantage that the number of subword tokens is a less direct measure of translation length, especially for the case of the IWSLT Isometric SLT task where length compliance is measured in terms of number of characters.…”
Section: Length Encodingmentioning
confidence: 99%
“…The automation of closed captioning is an early application of automatic speech transcription and machine translation technologies [1,2,3] and progress in this area has been steady, owing to the improvement of the underlying technologies. While initially developed as sophisticated pipelines, neural sequenceto-sequence models have opened the prospect of integrated, end-to-end training for these systems [4,5]. Recent efforts have mainly focused on subtitling internet content, such as talks and classes.…”
Section: Genres In Tv Showsmentioning
confidence: 99%
“…Regarding the quality of the text, we use the BLEU score [24] with respect to reference captions, as well as the SARI [13] metric, which not only scores the similarity to the reference but also rewards divergences (likely simplifications) from the input text. 5 Another measure of simplification is the Flesch Reading Ease (FRE) index (larger is simpler), adapted to French in [26].…”
Section: Evaluation Metricsmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, Karakanta, Negri, and Turchi (2020b) trained a sequence-to-sequence model which receives a full sentence and generates the same sentence inserting symbols which correspond to subtitle breaks. Focusing on the length constraint, Liu, Niehues, and Spanakis (2020) proposed adapting an Automatic Speech Recognition (ASR) system to incorporate transcription and text compression, for generating more readable subtitles.…”
Section: Readable Subtitlesmentioning
confidence: 99%