2020
DOI: 10.48550/arxiv.2005.07412
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

Abstract: In this paper, we propose WG-WaveNet, a fast, lightweight, and high-quality waveform generation model. WG-WaveNet is composed of a compact flow-based model and a post-filter. The two components are jointly trained by maximizing the likelihood of the training data and optimizing loss functions on the frequency domains. As we design a flow-based model that is heavily compressed, the proposed model requires much less computational resources compared to other waveform generation models during both training and inf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 21 publications
(37 reference statements)
0
2
0
Order By: Relevance
“…Neural text-to-speech (TTS) has achieved remarkable audio qualities recently [1,2]. Most recent TTS systems comprise two cascaded separated modules: synthesizer [1][2][3][4] and vocoder [5][6][7][8][9][10][11][12][13]. In the first module, the synthesizer takes text as input and outputs an audio mid-representation.…”
Section: Introductionmentioning
confidence: 99%
“…Neural text-to-speech (TTS) has achieved remarkable audio qualities recently [1,2]. Most recent TTS systems comprise two cascaded separated modules: synthesizer [1][2][3][4] and vocoder [5][6][7][8][9][10][11][12][13]. In the first module, the synthesizer takes text as input and outputs an audio mid-representation.…”
Section: Introductionmentioning
confidence: 99%
“…As a result, TTS research has focused on finding alternative vocoder architectures such as Parallel-WaveNet (Oord et al 2018), WaveRNN (Kalchbrenner et al 2018), Clar-iNet (Ping, Peng, and Chen 2018) and WaveGlow (Prenger, Valle, and Catanzaro 2019) that achieve higher performance when deployed on existing hardware. There is a degree of ambiguity as to the highest quality vocoder as audio quality evaluation is subjective but all authors agree that WaveNet produces at least as good if not higher quality audio than the more recent approaches (Kim et al 2018;Oord et al 2018;Prenger, Valle, and Catanzaro 2019;Tian et al 2020;Hsu and Lee 2020).…”
Section: Introductionmentioning
confidence: 99%