Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021) 2021
DOI: 10.18653/v1/2021.iwslt-1.26
|View full text |Cite
|
Sign up to set email alerts
|

Between Flexibility and Consistency: Joint Generation of Captions and Subtitles

Abstract: Speech translation (ST) has lately received growing interest for the generation of subtitles without the need for an intermediate source language transcription and timing (i.e. captions). However, the joint generation of source captions and target subtitles does not only bring potential output quality advantages when the two decoding processes inform each other, but it is also often required in multilingual scenarios. In this work, we focus on ST models which generate consistent captions-subtitles in terms of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 29 publications
0
8
0
Order By: Relevance
“…In addition to BLEU score, measuring the consistency between captions and subtitles is also an important aspect. We reuse the structural and lexical consistency score proposed by Karakanta et al (2021). Structural consistency measures the percentage of utterances having the same number of blocks in both languages, while lexical scores count the proportion of words in the two languages that are aligned in the same block (refer to Appendix C for additional details).…”
Section: Experimental Settingsmentioning
confidence: 99%
See 2 more Smart Citations
“…In addition to BLEU score, measuring the consistency between captions and subtitles is also an important aspect. We reuse the structural and lexical consistency score proposed by Karakanta et al (2021). Structural consistency measures the percentage of utterances having the same number of blocks in both languages, while lexical scores count the proportion of words in the two languages that are aligned in the same block (refer to Appendix C for additional details).…”
Section: Experimental Settingsmentioning
confidence: 99%
“…As defined by Karakanta et al (2021), for the stuctural consistency, both captions (EN) and subtitles (FR) have the same number of 3 blocks. For lexical consistency, there are 6 tokens of the subtitles which are not aligned to captions in the same block: "le capitalisme ," , "au même titre".…”
Section: Consistency Scorementioning
confidence: 99%
See 1 more Smart Citation
“…Notable references are (Dong et al, 2015), which introduces a multi-task framework; , which studies ways to strengthen a basic multilingual decoder; while closer to our work, consider a dual decoder relying on dual self-attention mechanism. Related techniques have also been used to simultaneously generate a transcript and a translation for a spoken input (Anastasopoulos and Chiang, 2018;Le et al, 2020) and to generate consistent caption and subtitle for an audio source (Karakanta et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…The convergence of model structures for ASR and ST inspires works that use a single model to perform both ASR and ST [8,9,10,11,12] . Liu et al proposed an interactive decoding strategy between ASR and ST [13].…”
Section: Introductionmentioning
confidence: 99%