ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682927
|View full text |Cite
|
Sign up to set email alerts
|

End-to-end Code-switched TTS with Mix of Monolingual Recordings

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0
5

Year Published

2019
2019
2022
2022

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 30 publications
(16 citation statements)
references
References 11 publications
0
11
0
5
Order By: Relevance
“…Different from Tacotron, we use the duration from the ASR model for frame expansion instead of attentionbased soft alignment. To represent and control the speaker identity and accent, we separately add accent embedding into the encoder and speaker embedding into the decoder [26]. To wipe out speaker-related information in the encoder output, we add an auxiliary speaker classifier after the encoder and adversarial training strategy is adopted, which will be introduced in detail in Section 3.3.…”
Section: Overview and Model Architecturementioning
confidence: 99%
“…Different from Tacotron, we use the duration from the ASR model for frame expansion instead of attentionbased soft alignment. To represent and control the speaker identity and accent, we separately add accent embedding into the encoder and speaker embedding into the decoder [26]. To wipe out speaker-related information in the encoder output, we add an auxiliary speaker classifier after the encoder and adversarial training strategy is adopted, which will be introduced in detail in Section 3.3.…”
Section: Overview and Model Architecturementioning
confidence: 99%
“…Therefore, our approach can be seen as analogous to these works. Our work is closest to [20,21] in that we use monolingual recordings. However, we explicitly work in the latent prior space while [20] operate at the level of encoding individual languages and [21] begin with an average voice and refine it using phoneme informed attention.…”
Section: Synthesis Of Code Mixed Textmentioning
confidence: 99%
“…In parallel with the ZS-TTS, multilingual TTS has also evolved aiming at learning models for multiple languages at the same time [14,15,16,17]. Some of these models are particularly interesting as they allow for code-switching, i.e.…”
Section: Introductionmentioning
confidence: 99%