ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746914
|View full text |Cite
|
Sign up to set email alerts
|

Improving Cross-Lingual Speech Synthesis with Triplet Training Scheme

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 9 publications
0
9
0
Order By: Relevance
“…This task has many applications, such as code-mixed speech synthesis for a voice agent, foreign movie dubbing [4], and computer-assisted pronunciation teaching [5]. Due to the difficulty of obtaining a bilingual corpus produced by a highly proficient speaker in both languages, more practically, current studies mainly build a cross-lingual TTS system based on corpora from monolingual speakers in different languages [6], [7], [8], [9]. However, these approaches mostly ignore modeling emotion aspects during speech generation, while emotion is a kind of indispensable paralinguistic information that reveals the speaker's intentions and moods.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…This task has many applications, such as code-mixed speech synthesis for a voice agent, foreign movie dubbing [4], and computer-assisted pronunciation teaching [5]. Due to the difficulty of obtaining a bilingual corpus produced by a highly proficient speaker in both languages, more practically, current studies mainly build a cross-lingual TTS system based on corpora from monolingual speakers in different languages [6], [7], [8], [9]. However, these approaches mostly ignore modeling emotion aspects during speech generation, while emotion is a kind of indispensable paralinguistic information that reveals the speaker's intentions and moods.…”
Section: Introductionmentioning
confidence: 99%
“…The reason for this phenomenon is that each speaker in the training set speaks only one language, and the entanglement between different speech factors, such as linguistic content, speaker identity, and emotion, makes it hard to only transfer the speaker's timbre across different languages. Therefore, the key to alleviating this foreign accent issue is how to properly disentangle the speaker and language or linguistic content [8], [9], [10].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Over the recent years, various approaches have been proposed to build cross-lingual Text-to-Speech (TTS) systems [1][2][3]. However, L2 (second-language) accents frequently appear in such cross-lingual scenarios, and several attempts have been made to improve its nativeness [1][2][3].…”
Section: Introductionmentioning
confidence: 99%
“…Over the recent years, various approaches have been proposed to build cross-lingual Text-to-Speech (TTS) systems [1][2][3]. However, L2 (second-language) accents frequently appear in such cross-lingual scenarios, and several attempts have been made to improve its nativeness [1][2][3]. To make matters even worse, subjective evaluation of less major languages is challenging, especially for researchers in a less diverse environment.…”
Section: Introductionmentioning
confidence: 99%