2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2017
DOI: 10.1109/asru.2017.8269007
|View full text |Cite
|
Sign up to set email alerts
|

An investigation of multi-speaker training for wavenet vocoder

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
114
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
1

Relationship

4
4

Authors

Journals

citations
Cited by 108 publications
(116 citation statements)
references
References 17 publications
1
114
1
Order By: Relevance
“…The NU VC system uses a WaveNet-based vocoder [17,18,19] to model the waveform of the target speaker and generate the converted waveform using estimated speech features. Several flows are used in producing the estimated spectral features, where the direct waveform modification [2] method is employed.…”
Section: Waveform-processing Modulementioning
confidence: 99%
See 2 more Smart Citations
“…The NU VC system uses a WaveNet-based vocoder [17,18,19] to model the waveform of the target speaker and generate the converted waveform using estimated speech features. Several flows are used in producing the estimated spectral features, where the direct waveform modification [2] method is employed.…”
Section: Waveform-processing Modulementioning
confidence: 99%
“…On the other hand, in the handling of prosodic parameters, such as fundamental frequency (F0), several methods have been commonly used including a simple mean/variance linear transformation, a contour-based transformation [13], GMM-based mapping [14], and neural network [15]. For waveform generation, approaches include the source-filter vocoder system [16], the latest direct waveform modification technique [2], and the use of state-ofthe-art WaveNet modeling [17,18,19].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…To address this issue, many neural-based vocoders [18][19][20][21][22][23] have been proposed to replace the traditional vocoders in the synthesis part of VC. In this paper, we focus on the WaveNet (WN) vocoder [18][19][20][21], which is an autoregressive model conditioned on auxiliary features to generate a raw waveform without many handcrafted assumptions. Although the WN vocoder generate high-fidelity speech conditioned on the training acoustic features, the fixed network architectures of WN are not efficient and may reduce the robustness against unseen fundamental frequency (F0) features that are not observed in the range of training data.…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, there are two mainstreams that try to improve the waveform generation module. One direction is to develop neural vocoders [11,12,13,14,15,16,17,18,19], which are capable of reconstructing the phase and excitation information, and thus generate extremely natural sounding speech.…”
Section: Introductionmentioning
confidence: 99%