ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747023
|View full text |Cite
|
Sign up to set email alerts
|

ISOMETRIC MT: Neural Machine Translation for Automatic Dubbing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 17 publications
0
4
0
Order By: Relevance
“…In these cases, we are considering context in the sense of the use case for translation: the setting for which the particular system is designed, which may inform choices about the model or about desired qualities of the MT output. Translation for dubbing requires producing translations of similar lengths to the source (Lakew et al, 2022), while translation for subtitling may require handling additional formatting issues (Cherry et al, 2021), and simultaneous MT requires approaches to better handle potential variations in word order (Grissom II et al, 2014) or incremental decoding approaches (Gu et al, 2017;Dalvi et al, 2018). Similarly, NMT systems for use in computer-aided translation settings such as interactive translation prediction may use modified decoding or training approaches (Knowles and Koehn, 2016;Wuebker et al, 2016;Li et al, 2021).…”
Section: World Knowledge and External Informationmentioning
confidence: 99%
“…In these cases, we are considering context in the sense of the use case for translation: the setting for which the particular system is designed, which may inform choices about the model or about desired qualities of the MT output. Translation for dubbing requires producing translations of similar lengths to the source (Lakew et al, 2022), while translation for subtitling may require handling additional formatting issues (Cherry et al, 2021), and simultaneous MT requires approaches to better handle potential variations in word order (Grissom II et al, 2014) or incremental decoding approaches (Gu et al, 2017;Dalvi et al, 2018). Similarly, NMT systems for use in computer-aided translation settings such as interactive translation prediction may use modified decoding or training approaches (Knowles and Koehn, 2016;Wuebker et al, 2016;Li et al, 2021).…”
Section: World Knowledge and External Informationmentioning
confidence: 99%
“…• HW-TSC: In contrast to our three baselines, took a more traditional approach to dubbing and followed the prior works on verbosity control (Lakew et al, 2021(Lakew et al, , 2019 to first generate a set of translation candidates and later re-rank them. Their system consists of four parts: 1) voice activity detection followed by pause alignment, 2) generating a list of translation candidates, 3) phoneme duration prediction, followed by 4) re-ranking/scaling the candidates based on the durations (see Figure 6).…”
Section: Submissionsmentioning
confidence: 99%
“…So far such dubbing has been produced only for movies after the fact but it is costly, requires considerable human effort, and the result is frequently not convincing when the original video and the target voice and language don't properly align. One proposed solution to improve on these problems is to apply isometric human or machine translation [2,30], where speech translation is performed on an original video source in a manner that optimizes a temporal match between the translator's generated output text and the original video. With isometric translation a better dubbing could thus be achieved, but the dubbed speech from a voice talent (or synthetic voice) in the output language still does not match well with the lip movement and the voice of the original speaker in the original video.…”
Section: Introductionmentioning
confidence: 99%