ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414966
|View full text |Cite
|
Sign up to set email alerts
|

Improvements to Prosodic Alignment for Automatic Dubbing

Abstract: Automatic dubbing is an extension of speech-to-speech translation such that the resulting target speech is carefully aligned in terms of duration, lip movements, timbre, emotion, prosody, etc. of the speaker in order to achieve audiovisual coherence. Dubbing quality strongly depends on isochrony, i.e., arranging the translation of the original speech to optimally match its sequence of phrases and pauses. To this end, we present improvements to the prosodic alignment component of our recently introduced dubbing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(20 citation statements)
references
References 18 publications
0
16
0
Order By: Relevance
“…In the past there has been little work to address isochrony in dubbing [6,7,8,9]. The approach of [6] involved generating and rescoring segmentation hypotheses by utilizing the attention weights of neural machine translation.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…In the past there has been little work to address isochrony in dubbing [6,7,8,9]. The approach of [6] involved generating and rescoring segmentation hypotheses by utilizing the attention weights of neural machine translation.…”
Section: Related Workmentioning
confidence: 99%
“…High quality video dubbing usually involves speech synchronization at the utterance level (isochrony), lip movement level (phonetic synchrony) and body movement level (kinetic synchrony). In the past, most work on AD [6,7,8,9] addressed isochrony, i.e., translating original speech by optimally matching its sequence of phrases and pauses. The idea is to first machine translate the source transcript by generating output with roughly the same duration [10,11] -i.e.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations