Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2983
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating and Optimizing Prosodic Alignment for Automatic Dubbing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
43
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 20 publications
(43 citation statements)
references
References 18 publications
0
43
0
Order By: Relevance
“…We build on the automatic dubbing architecture presented in [7,8], and described in Figure 1, that extends a speech-to-speech translation [1,2,3] pipeline with: neural machine translation (MT) robust to ASR errors and able to control verbosity of the output [12,10,13]; prosodic alignment (PA) [6] which addresses phrase-level synchronization of the MT output by leveraging the force-aligned source transcript; neural text-to-speech (TTS) [14,15,16] with precise duration control; and, finally, audio rendering that enriches TTS output with the original background noise (extracted via audio source separation with deep U-Nets [17,18]) and reverberation, estimated from the original audio [19,20].…”
Section: Dubbing Architecturementioning
confidence: 99%
See 4 more Smart Citations
“…We build on the automatic dubbing architecture presented in [7,8], and described in Figure 1, that extends a speech-to-speech translation [1,2,3] pipeline with: neural machine translation (MT) robust to ASR errors and able to control verbosity of the output [12,10,13]; prosodic alignment (PA) [6] which addresses phrase-level synchronization of the MT output by leveraging the force-aligned source transcript; neural text-to-speech (TTS) [14,15,16] with precise duration control; and, finally, audio rendering that enriches TTS output with the original background noise (extracted via audio source separation with deep U-Nets [17,18]) and reverberation, estimated from the original audio [19,20].…”
Section: Dubbing Architecturementioning
confidence: 99%
“…In the past, there has been little work to address prosodic alignment for automatic dubbing [6,7,8]. The work of [6] utilized the attention mechanism of neural machine translation to achieve isochrony.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations