Rahim Soleymanpour scite author profile

Temporal modulation processing is a promising technique for improving the intelligibility and quality of speech in noise. We propose a speech enhancement algorithm to construct the temporal envelope (TEV) in the time-frequency domain by means of an embedded convolutional neural network (CNN). To accomplish this, the input speech signals are divided into sixteen parallel frequency bands (subbands) with bandwidths approximating 1.5 times that of auditory filters. The corrupted TEVs in each subband are extracted and then fed to the 1-dimensional CNN (1-D CNN) model to restore the TEVs distorted by noise. The method is evaluated using 2,700 words from nine different talkers, which are mixed with speech-spectrum shaped random noise (SSN), and babble noise, at different signal-to-noise ratios. The Short-time Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ) metrics are used to evaluate the performance of the 1-D CNN algorithm. Results suggest that the 1-D CNN model improves STOI scores by 27% and 34% for SSN and babble noise, respectively, and PESQ scores by 19% and 17%, respectively, compared to a conventional TEV-based speech enhancement algorithm.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rahim Soleymanpour

Textile-Based Stretchable and Flexible Glove Sensor for Monitoring Upper Extremity Prosthesis Functions

Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition

Enhancement of speech in noise using multi-channel, time-varying gains derived from the temporal envelope

Wide-Range Motion Recognition Through Insole Sensor Using Multi-Walled Carbon Nanotubes and Polydimethylsiloxane Composites

Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments

Contact Info

Product

Resources

About