2020
DOI: 10.1109/access.2020.3029417
|View full text |Cite
|
Sign up to set email alerts
|

Improved Relativistic Cycle-Consistent GAN With Dilated Residual Network and Multi-Attention for Speech Enhancement

Abstract: Generative adversarial networks (GANs) have been increasingly used as feature mapping functions in speech enhancement, in which the noisy speech features are transformed to the clean ones through the generators. This paper proposes a novel speech enhancement model based on a cycle-consistent relativistic GAN with Dilated Residual Networks and a Multi-attention mechanism. Using the adversarial loss, improved cycle-consistency losses, and an identity-mapping loss, a noisy-to-clean generator G and an inverse clea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2

Relationship

2
7

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 41 publications
0
12
0
Order By: Relevance
“…Traditional Optical Character Recognition (OCR) is based on image processing (minimization, texture analysis, connected domain analysis, etc.) [2][3][4]. Modern business bills are of many types and placed randomly, so it is difficult to achieve good recognition results with traditional OCR detection methods.…”
Section: Introductionmentioning
confidence: 99%
“…Traditional Optical Character Recognition (OCR) is based on image processing (minimization, texture analysis, connected domain analysis, etc.) [2][3][4]. Modern business bills are of many types and placed randomly, so it is difficult to achieve good recognition results with traditional OCR detection methods.…”
Section: Introductionmentioning
confidence: 99%
“…The AHA module integrates different hierarchical feature maps given all outputs of ATFA modules and fuse the global context of all intermediate oupputs with different weights, thus helping to guide the feature learning progressively. The output channels of each layer in encoder are (16,32,64), while the kernel size and strides are set to (3,5) and (1, 2) along the time and frequency axes, respectively.…”
Section: Network Architecturementioning
confidence: 99%
“…To tackle this problem, cycle-consistent GAN (CycleGAN) has been developed to conduct unsupervised SE, which was originally used for unpaired image-to-image translation [13]. In the SE area, CycleGAN-based approaches demonstrate the remarkable ability in preserving the speech structure and improving speech quality in both paired and unpaired cases [14,15,16,17,18]. Nevertheless, these conventional CycleGAN-based approaches have several limitations for non-parallel SE.…”
Section: Introductionmentioning
confidence: 99%
“…Over the last decade, deep learning has been widely utilized for SE and has demonstrated exceptional denoising capability even in challenging conditions such as when dealing with nonstationary noise, unseen and low SNR conditions. Supervised deep neural networks (DNNs), which include feedforward multilayer perceptrons [15], convolutional neural networks [19], recurrent neural networks [20] and generative adversarial networks [21], in particular, have significantly elevated the performance of SE through capturing the complicated relationship between the noisy speech and clean speech. There has also been an increase in the use of modified versions of conventional neural networks.…”
Section: Introductionmentioning
confidence: 99%