Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-988
|View full text |Cite
|
Sign up to set email alerts
|

FluentTTS: Text-dependent Fine-grained Style Control for Multi-style TTS

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Diffusion Models and Backdoors. Diffusion Models have attracted a lot of researchers, to propose different models (Ho, Jain, and Abbeel 2020;Ermon 2019, 2020;Karras et al 2022;Rombach et al 2022) and different applications (Xu et al 2022;Jeong et 2021; Popov et al 2021;Kim, Kim, and Yoon 2022;Kong et al 2021;Mei and Patel 2022;Ho et al 2022;Ruiz et al 2023). A lot of methods are proposed to deal with the slow sampling process (Song, Meng, and Ermon 2021;Lu et al 2022a,b;Zhao et al 2023;Karras et al 2022;) Though they achieve a huge success, they are vulnerable to backdoor attacks (Chou, Chen, and Ho 2023a;Chen, Song, and Li 2023;Chou, Chen, and Ho 2023b).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Diffusion Models and Backdoors. Diffusion Models have attracted a lot of researchers, to propose different models (Ho, Jain, and Abbeel 2020;Ermon 2019, 2020;Karras et al 2022;Rombach et al 2022) and different applications (Xu et al 2022;Jeong et 2021; Popov et al 2021;Kim, Kim, and Yoon 2022;Kong et al 2021;Mei and Patel 2022;Ho et al 2022;Ruiz et al 2023). A lot of methods are proposed to deal with the slow sampling process (Song, Meng, and Ermon 2021;Lu et al 2022a,b;Zhao et al 2023;Karras et al 2022;) Though they achieve a huge success, they are vulnerable to backdoor attacks (Chou, Chen, and Ho 2023a;Chen, Song, and Li 2023;Chou, Chen, and Ho 2023b).…”
Section: Related Workmentioning
confidence: 99%
“…Generative AIs become increasingly popular due to their applications in different synthesis or editing tasks (Couairon et al 2023;Meng et al 2022;Zhang et al 2022). Among the different types of generative AI models, Diffusion Models (DM) (Ho, Jain, and Abbeel 2020;Song and Ermon 2019;Karras et al 2022) are the recent driving force because of their superior ability to produce high-quality and diverse samples in many domains (Xu et al 2022;Jeong et al 2021;Popov et al 2021;Kim, Kim, and Yoon 2022;Kong et al 2021;Mei and Patel 2022;Ho et al 2022), and their more stable training than the adversarial training in traditional Generative Adversarial Networks (Goodfellow et al 2014;Arjovsky, Chintala, and Bottou 2017;Miyato et al 2018).…”
Section: Introductionmentioning
confidence: 99%
“…GAN-based models [2,19] have brought innovative contributions to TTS in the last decade using adversarial training strategy. Recently, another successful generative approach, diffusion-based methods [11,12,13,9], have been proposed in speech synthesis, as the diffusion methods have proved their effectiveness in various generation tasks [21,8,7]. Compared to GAN-based models, diffusion methods have advantages in impressive results as well as distribution coverage, a fixed training objective, and scalability.…”
Section: Related Workmentioning
confidence: 99%
“…However, to learn various speakers' characteristics for the multispeaker TTS, the TTS model requires sufficient length of recorded speech for each person. Previous works [9,14,15] trained their models using audiobook dataset read by several speakers with enough lengths of utterances, where it is difficult to generalise models for unseen speakers. To solve this problem, we suggest an effective strategy, a speaker feature binding loss, maintaining speaker characteristics of target voices in synthesised speech.…”
Section: Speaker Conditioning With Cross-modal Biometricsmentioning
confidence: 99%
See 1 more Smart Citation