2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
DOI: 10.1109/waspaa.2019.8937165
|View full text |Cite
|
Sign up to set email alerts
|

Parametric Resynthesis With Neural Vocoders

Abstract: Noise suppression systems generally produce output speech with copromised quality. We propose to utilize the high quality speech generation capability of neural vocoders for noise suppression. We use a neural network to predict clean mel-spectrogram features from noisy speech and then compare two neural vocoders, WaveNet and WaveGlow, for synthesizing clean speech from the predicted mel spectrogram. Both WaveNet and WaveGlow achieve better subjective and objective quality scores than the source separation mode… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 29 publications
0
10
0
Order By: Relevance
“…WaveGlow benefits from the best of Glow and WaveNet so as to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. We note that WaveGlow is implemented using only a single network with a single cost function, that is to maximize the likelihood of the training data, which makes the training procedure simple and stable [105].…”
Section: A Speech Analysis and Reconstructionmentioning
confidence: 99%
“…WaveGlow benefits from the best of Glow and WaveNet so as to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. We note that WaveGlow is implemented using only a single network with a single cost function, that is to maximize the likelihood of the training data, which makes the training procedure simple and stable [105].…”
Section: A Speech Analysis and Reconstructionmentioning
confidence: 99%
“…In the subjective test, the official 6 The audio samples of WN were generated by using Yamamoto's open source implementation: https://doi.org/10.5281/zenodo.1472609. This code has been utilized as a reference in several papers [67], [75], [76]. 7 The audio samples were brought from Google Drive: https://drive.google.…”
Section: Comparison To Neural Vocodersmentioning
confidence: 99%
“…Speech samples are 8-bit mu-law quantized. We use the officially published LPCNet implementation 2 with 640 units in GRU-A and 16 units in GRU-B. We refer to the PR system with LPCNet as its vocoder as PR-LPCNet.…”
Section: Vocodersmentioning
confidence: 99%
“…Parametric Resynthesis (PR) systems [2,3] predict clean acoustic parameters from noisy speech and synthesize speech from these predicted parameters using a speech synthesizer or vocoder. Current speech synthesizers are trained to generate high quality speech for a single speaker.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation