2019
DOI: 10.1109/taslp.2019.2906484
|View full text |Cite
|
Sign up to set email alerts
|

GlotNet—A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 43 publications
(33 citation statements)
references
References 35 publications
0
33
0
Order By: Relevance
“…However, similar studies reporting variations in the WaveNet performance (depending on speaker and acoustic features) are found in the literature: MOS scores around 3.5 have been reported when using mel-filterbank [36] or melcepstrum [7] acoustic features. Meanwhile, our version of WaveNet has previously achieved MOS scores above 4 using glottal vocoder acoustic features [17]. The performance of a parallel waveform generator is likely to increase with the use of a separate pitch predictor model [15].…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…However, similar studies reporting variations in the WaveNet performance (depending on speaker and acoustic features) are found in the literature: MOS scores around 3.5 have been reported when using mel-filterbank [36] or melcepstrum [7] acoustic features. Meanwhile, our version of WaveNet has previously achieved MOS scores above 4 using glottal vocoder acoustic features [17]. The performance of a parallel waveform generator is likely to increase with the use of a separate pitch predictor model [15].…”
Section: Discussionmentioning
confidence: 99%
“…However, the intended use of phase recovery in Tacotron-based synthesis is generally not high-quality synthesis, but rather a development tool to quickly check whether the generated acoustic features are meaningful. For a reference autoregressive WaveNet vocoder, we trained a speaker dependent model conditioned on mel-spectrograms, following the "Wave-30" configuration from [17]. The system uses a dilation cycle of 10 (e.g., 1, 2, ..., 512) repeated three times, 64 residual channels, and 256 post-net channels.…”
Section: Reference Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…There have been several attempt to incorporate the LP filter into autoregressive neural vocoding systems. For instance, GlotNet and ExcitNet used the WaveNet structure to generate the glottal exci-tation [9,10]. In case of the LPCNet, it employed the lightweight WaveRNN model for fast generation of the excitation.…”
Section: Relationship To Prior Workmentioning
confidence: 99%