2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2021
DOI: 10.1109/waspaa52581.2021.9632770
|View full text |Cite
|
Sign up to set email alerts
|

HiFi-GAN-2: Studio-Quality Speech Enhancement via Generative Adversarial Networks Conditioned on Acoustic Features

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 35 publications
(11 citation statements)
references
References 23 publications
0
11
0
Order By: Relevance
“…In recent years GANs have been successfully applied to a wide range of normal speech processing tasks including, but not limited to: 1) speech synthesis [74], [75]; 2) voice conversion [76], [77]; 3) speech enhancement [78], [79]; 4) code-switching sentence generation [80], [81]; 5) speech emotion recognition [82], [83]; 6) speaker verification [84]- [86] and 7) speech recognition [87]- [89].…”
Section: Corpusmentioning
confidence: 99%
“…In recent years GANs have been successfully applied to a wide range of normal speech processing tasks including, but not limited to: 1) speech synthesis [74], [75]; 2) voice conversion [76], [77]; 3) speech enhancement [78], [79]; 4) code-switching sentence generation [80], [81]; 5) speech emotion recognition [82], [83]; 6) speaker verification [84]- [86] and 7) speech recognition [87]- [89].…”
Section: Corpusmentioning
confidence: 99%
“…Methods that operate on the time-frequency domain generally produce audible artifacts due to the use of phase reconstruction algorithms like the Griffin-Lim algorithm [14]. A recent work addresses this with neural-network based vocoders [15], yet its quality is not on par with an end-to-end approach [16]. Alternatively, methods that work on the time domain typically require more training steps [1].…”
Section: Related Workmentioning
confidence: 99%
“…The latter is particularly fruitful, as it has been shown to be effective while operating several times faster than real-time. Each layer in P can optionally be locally conditioned on acoustic features as in [16]. This allows the filtering operation to adapt itself as a function of a more global context vector.…”
Section: Extensions Using Black-box Residual Postnetsmentioning
confidence: 99%
“…Meanwhile, adversarial training promotes plausible system outputs as determined by a discriminator network [7]. Hybrid approaches combining these methodologies have grown increasingly more common [10,17,16], and balance average system performance with output signal plausibility. To this end, we adopt the adversarial loss and multi-scale discriminator architecture in [7].…”
Section: Adversarial Trainingmentioning
confidence: 99%