2021
DOI: 10.48550/arxiv.2109.13731
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

VoiceFixer: Toward General Speech Restoration with Neural Vocoder

Haohe Liu,
Qiuqiang Kong,
Qiao Tian
et al.

Abstract: Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus on single-task speech restoration (SSR), such as speech denoising or speech declipping. However, SSR systems only focus on one task and do not address the general speech restoration problem. In addition, previous SSR systems show limited performance in some speech restoration tasks such as speech super-resolution. To overcome those limitations, we propose a general speech restoration (GSR) task that attempts to remove m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 58 publications
0
8
0
Order By: Relevance
“…The estimated speech signal obtained may be distorted due to oversubtraction. We will then use a pre-trained VoiceFixer model [17], which shows great performance in restoring strongly degraded human speech, to reconstruct this. Another crucial factor in Eq.6 is the matching 𝑡 between the robot speech and the reference speech.…”
Section: Audiomentioning
confidence: 99%
“…The estimated speech signal obtained may be distorted due to oversubtraction. We will then use a pre-trained VoiceFixer model [17], which shows great performance in restoring strongly degraded human speech, to reconstruct this. Another crucial factor in Eq.6 is the matching 𝑡 between the robot speech and the reference speech.…”
Section: Audiomentioning
confidence: 99%
“…NU-Wave 2 [23] further improves NU-Wave [49] from two aspects. On the one hand, NU-Wave 2 [23] adopts short-time Fourier convolution (STFC) to overcome the limitations of NU-Wave [49] that fails to generate harmonics of vowels [49] and various frequency bands [57,72]. On the other hand, different from prior models that the initial and target sampling rates are fixed, NU-Wave 2 [23] defines a new task general neural audio upsampling that the inputs can be any sampling rate for a single model.…”
Section: Unsupervised Restorationmentioning
confidence: 99%
“…where F and F −1 refer to the Fourier transform and its inverse operation, respectively,H ϕ j is a zero-phase frequency-domain filter computed through (10) using the parameters in ϕ j , and ⊙ is the Hadamard product, or element-wise multiplication. This operation is, in practice, realized in a frame-by-frame manner using a short-time Fourier transform, using a Hamming window length of 4096 samples and a hop size of 2048 samples.…”
Section: Joint Posterior Sampling and Filter Inferencementioning
confidence: 99%