Frederico Santos de Oliveira scite author profile

Frederico Santos de Oliveira

5Publications

79Citation Statements Received

49Citation Statements Given

How they've been cited

How they cite others

Affiliations

Universidade Federal de Goiás, Universidade Federal de Mato Grosso, Federal University of Lavras

Publications

Order By: Most citations

SC-GlowTTS: An Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

Casanova¹,

Shulby²,

Golge³

et al. 2021

View full text Add to dashboard Cite

Most Zero-shot Multi-speaker TTS (ZS-TTS) systems support only a single language. Although models like YourTTS, VALL-E X, Mega-TTS 2, and Voicebox explored Multilingual ZS-TTS they are limited to just a few high/medium resource languages, limiting the applications of these models in most of the low/medium resource languages. In this paper, we aim to alleviate this issue by proposing and making publicly available the XTTS system. Our method builds upon the Tortoise model and adds several novel modifications to enable multilingual training, improve voice cloning, and enable faster training and inference. XTTS was trained in 16 languages and achieved state-of-the-art (SOTA) results in most of them.

show abstract

CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

Cândido¹,

Casanova²,

Soares³

et al. 2021

Preprint

View full text Add to dashboard Cite

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

Casanova¹,

Shulby²,

Golge³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen in training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transformer-based encoder. Additionally, we have shown that adjusting a GAN-based vocoder for the spectrograms predicted by the TTS model on the training dataset can significantly improve the similarity and speech quality for new speakers. Our model is able to converge in training, using only 11 speakers, reaching state-of-the-art results for similarity with new speakers, as well as high speech quality.

show abstract

TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese

Casanova

Cândido

Shulby³

et al. 2022

Lang Resources & Evaluation

View full text Add to dashboard Cite

Brazilian Portuguese Speech Recognition Using Wav2vec 2.0

Gris¹,

Casanova²,

Oliveira³

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.