Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-314
|View full text |Cite
|
Sign up to set email alerts
|

Speaker-Dependent WaveNet Vocoder

Abstract: In this study, we propose a speaker-dependent WaveNet vocoder, a method of synthesizing speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as auxiliary features of WaveNet. It is expected that WaveNet can learn a sample-by-sample correspondence between speech waveform and acoustic features. The advantage of the proposed method is that it does not require (1) explicit modeling of excitation signals and (2) various assumptions, which are based on prior knowledge specific to speec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
232
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 281 publications
(234 citation statements)
references
References 12 publications
2
232
0
Order By: Relevance
“…Nevertheless, this information is crucial for the inversion from the frequency domain back into a temporal signal. Recent studies show that high quality speech waveforms can be synthesized by using Wavenet [46] conditioned on acoustic features estimated from a mel-cepstrum vocoder [47]. During network training, the model learns the link between speech signal and its acoustic features automatically without making any assumptions about prior knowledge of speech.…”
Section: Wavenet Vocoder For the Reconstruction Of Audible Waveformsmentioning
confidence: 99%
“…Nevertheless, this information is crucial for the inversion from the frequency domain back into a temporal signal. Recent studies show that high quality speech waveforms can be synthesized by using Wavenet [46] conditioned on acoustic features estimated from a mel-cepstrum vocoder [47]. During network training, the model learns the link between speech signal and its acoustic features automatically without making any assumptions about prior knowledge of speech.…”
Section: Wavenet Vocoder For the Reconstruction Of Audible Waveformsmentioning
confidence: 99%
“…The NU VC system uses a WaveNet-based vocoder [17,18,19] to model the waveform of the target speaker and generate the converted waveform using estimated speech features. Several flows are used in producing the estimated spectral features, where the direct waveform modification [2] method is employed.…”
Section: Waveform-processing Modulementioning
confidence: 99%
“…On the other hand, in the handling of prosodic parameters, such as fundamental frequency (F0), several methods have been commonly used including a simple mean/variance linear transformation, a contour-based transformation [13], GMM-based mapping [14], and neural network [15]. For waveform generation, approaches include the source-filter vocoder system [16], the latest direct waveform modification technique [2], and the use of state-ofthe-art WaveNet modeling [17,18,19].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…More recently, deep learning techniques have reshaped the way speech synthesis is done. Many neural waveform synthesizers surpass traditional parametric synthesis models in speech quality [6,7]. These waveform synthesizers avoid many speech-specific assumptions by using generic neural networks, e.g., the convolution net in WaveNet [6] and recurrent net in SampleRNN [8].…”
Section: Introductionmentioning
confidence: 99%