2022
DOI: 10.1109/taslp.2022.3182268
|View full text |Cite
|
Sign up to set email alerts
|

Denoising-and-Dereverberation Hierarchical Neural Vocoder for Statistical Parametric Speech Synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 39 publications
0
3
0
Order By: Relevance
“…In our previous work, we proposed the HiNet vocoder [37] and its variant [39]. We have also successfully applied the HiNet vocoder in the reverberation modeling task [40] and denoising and dereveberation task [41], [42], respectively. As shown in Figure 1, the HiNet vocoder uses an ASP and a PSP to predict the frame-level log amplitude spectrum and phase spectrum of a waveform, respectively.…”
Section: Hinetmentioning
confidence: 99%
See 1 more Smart Citation
“…In our previous work, we proposed the HiNet vocoder [37] and its variant [39]. We have also successfully applied the HiNet vocoder in the reverberation modeling task [40] and denoising and dereveberation task [41], [42], respectively. As shown in Figure 1, the HiNet vocoder uses an ASP and a PSP to predict the frame-level log amplitude spectrum and phase spectrum of a waveform, respectively.…”
Section: Hinetmentioning
confidence: 99%
“…• HiNet: The HiNet vocoder [37] we previously proposed. The model configurations were the same as those used in the Baseline-HiNet model of our previous work [42].…”
Section: B Comparison Among Neural Vocodersmentioning
confidence: 99%
“…Here, the amplitude extension model was borrowed from our previous work [39] and included 2 bidirectional gated recurrent unit (GRU)-based recurrent layers, each with 1024 nodes (512 forward ones and 512 backward ones), 2 convolutional layers, each with 2048 nodes (filter width=9), and a feedforward linear output layer with 256 nodes. The generative adversarial network (GAN) with two discriminators which conducted convolution along the frequency and time axis [39] was applied to the amplitude extension model at the training stage.…”
Section: B Speech Generation Tasksmentioning
confidence: 99%