ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053795
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram

Abstract: We propose Parallel WaveGAN, a distillation-free, fast, and smallfootprint waveform generation method using a generative adversarial network. In the proposed method, a non-autoregressive WaveNet is trained by jointly optimizing multi-resolution spectrogram and adversarial loss functions, which can effectively capture the time-frequency distribution of the realistic speech waveform. As our method does not require density distillation used in the conventional teacher-student framework, the entire model can be ea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
485
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 622 publications
(487 citation statements)
references
References 23 publications
2
485
0
Order By: Relevance
“…Because of the autoregressive architecture, the WaveNet vocoder has a problem with the slow inference speed and thus is limited in its application. To solve this problem, various neural vocoder models have been proposed, and some can synthesize speech waveforms in real-time, even in a restricted environment with mobile CPUs [5,6]. These models are mostly trained on large sets of training data of more than 10 hours.…”
Section: Introductionmentioning
confidence: 99%
“…Because of the autoregressive architecture, the WaveNet vocoder has a problem with the slow inference speed and thus is limited in its application. To solve this problem, various neural vocoder models have been proposed, and some can synthesize speech waveforms in real-time, even in a restricted environment with mobile CPUs [5,6]. These models are mostly trained on large sets of training data of more than 10 hours.…”
Section: Introductionmentioning
confidence: 99%
“…More recently, Parallel WaveGAN [109] has also been proposed to generate high-quality voice using a generative adversarial network. Parallel WaveGAN is a distillationfree and fast waveform generation method, where a nonautoregressive WaveNet is trained by jointly optimizing multi-resolution spectrogram and adversarial loss functions.…”
Section: A Speech Analysis and Reconstructionmentioning
confidence: 99%
“…where λ is the hyper-parameter that controls the balance between L nll and L pl . As the STFT loss function is able to effectively capture the time-frequency distribution of the realistic speech waveform, especially for the harmonic components [12,13], the entire training process becomes more efficient.…”
Section: Stft-based Power Lossmentioning
confidence: 99%