2020
DOI: 10.48550/arxiv.2011.12985
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge

Abstract: Nowadays more and more applications can benefit from edge-based text-to-speech (TTS). However, most existing TTS models are too computationally expensive and are not flexible enough to be deployed on the diverse variety of edge devices with their equally diverse computational capacities. To address this, we propose FBWave, a family of efficient and scalable neural vocoders that can achieve optimal performance-efficiency trade-offs for different edge devices. FBWave is a hybrid flow-based generative model that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 16 publications
0
4
0
Order By: Relevance
“…The gigantic size of the model also limits its ability to compute in real-time. Another notable attempt in this area is flow-based models such as WaveGlow [15], FloWavenet [16], Melflow [17], WaveFlow [18], and FBWAVE [19], etc. They use a single loglikelihood loss to train specially designed invertible models.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The gigantic size of the model also limits its ability to compute in real-time. Another notable attempt in this area is flow-based models such as WaveGlow [15], FloWavenet [16], Melflow [17], WaveFlow [18], and FBWAVE [19], etc. They use a single loglikelihood loss to train specially designed invertible models.…”
Section: Introductionmentioning
confidence: 99%
“…They use a single loglikelihood loss to train specially designed invertible models. The inference speed is faster than AR models and can even be used on mobile devices after additional development efforts on CPU [19]. However, the unstable training process and unsatisfactory synthesis quality prevent it from being used in industrial applications.…”
Section: Introductionmentioning
confidence: 99%
“…The gigantic size of their model also restricts it from achieving real-time computation. Another notable endeavor in this field is made by flow-based models including waveglow [12], flowavenet [13], Melflow [14], waveflow [15] and FBWAVE [16] etc. They apply a single log-likelihood loss to train specially designed invertible models.…”
Section: Introductionmentioning
confidence: 99%
“…They apply a single log-likelihood loss to train specially designed invertible models. The inference speed is faster than AR models and can be deployed even on mobile CPU after extra efforts in engineering [16]. However, its unstable training process and unsatisfying synthesis quality prevent it from being deployed in industrial applications.…”
Section: Introductionmentioning
confidence: 99%