2023
DOI: 10.1109/access.2023.3236242
|View full text |Cite
|
Sign up to set email alerts
|

Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments

Abstract: Temporal modulation processing is a promising technique for improving the intelligibility and quality of speech in noise. We propose a speech enhancement algorithm to construct the temporal envelope (TEV) in the time-frequency domain by means of an embedded convolutional neural network (CNN). To accomplish this, the input speech signals are divided into sixteen parallel frequency bands (subbands) with bandwidths approximating 1.5 times that of auditory filters. The corrupted TEVs in each subband are extracted … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 50 publications
0
7
0
Order By: Relevance
“…CNNs have been used to achieve state-of-the-art performance in many image and video analysis tasks, such as object detection, image classification, and video classification. 34 36…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…CNNs have been used to achieve state-of-the-art performance in many image and video analysis tasks, such as object detection, image classification, and video classification. 34 36…”
Section: Resultsmentioning
confidence: 99%
“…CNNs have been used to achieve state-of-the-art performance in many image and video analysis tasks, such as object detection, image classification, and video classification. [34][35][36] Analysis of the questionnaires and the audio track recorded during the stop in the restaurant were used to identify noise sources.…”
Section: Keynotes Of the Restaurant Roommentioning
confidence: 99%
“…In Soleymanpour et al (2023) , speech enhancement in a single channel was implemented using CNN algorithms for complex noisy speeches to improve the speech quality ( Passricha & Aggarwal, 2019 ) which produces the following result; PESQ = 3.24 ( Wang & Wang, 2019 ; Park & Lee, 2017 ), CSIG (signal distortion) = 4.34 ( Pandey & Wang, 2019 ; Germain, Chen & Koltun, 2019 ), CBAK (background noise interference) = 4.10 ( Fu et al, 2018 ; Rownicka, Bell & Renals, 2020 ), COVL (overall quality of speech) = 3.81 ( Rethage, Pons & Serra, 2018 ), and SSNR (Segmented Signal to Noise Ratio) = 16.85 ( Choi et al, 2019 ). Additionally, CNN was said to be more effective than recursive neural networks (RNNs) ( Park & Lee, 2017 ) and traditional feedforward neural networks ( Oord et al, 2016 ).…”
Section: Research Backgroundmentioning
confidence: 99%
“…In addition, they retain complex spectral structures in the final speech. Feedforward DNNs (FDNNs) [5]- [9], [13], [16], [17], CNNs [18], [19], RNNs [20], [21], Generative Adversarial Network (GANs) [22], [23], and Transformers [24]- [26] are successful DNN approaches for speech enhancement.…”
Section: Ref#mentioning
confidence: 99%