Conventional single-channel speech enhancement methods implement the analysis-modification-synthesis (AMS) framework in the acoustic frequency domain. Recently, it has been shown that the extension of this framework to the modulation domain may result in better noise suppression. However, this conclusion has been reached by relying on a minimum statistics approach for the required noise power spectral density (PSD) estimation.Various noise estimation algorithms have been proposed over the years in the speech and audio processing literature. Among these, the widely used minimum statistics approach is known to introduce a time frame lag in the estimated noise spectrum. This can lead to highly inaccurate PSD estimates when the noise behaviour rapidly changes with time, i.e., non-stationary noise. Speech enhancement methods which employ these inaccurate noise PSD estimates tend to perform poorly in the noise suppression task, and in worst cases, may end up deteriorating the noisy speech signal even further. Noise PSD estimation algorithms using a priori information about the noise statistics have been shown to track non-stationary noise better than the conventional algorithms which rely on the minimum statistics approach.In this thesis, we perform noise suppression in the modulation domain with the noise and speech PSD derived from an estimation scheme which employs the a priori information of various speech and noise types. Specifically, codebooks of gain normalized linear prediction coefficients obtained from training on various speech and noise files are used as the a priori information while performing the estimation of the desired PSD. The PSD estimates derived from this codebook approach are used to obtain a minimum mean square error (MMSE) estimate of the clean speech modulation magnitude spectrum, which is then combined with the phase spectrum of the noisy speech to recover the enhanced speech signal. The enhanced speech signal is subjected to various objective experiments for evaluation. Results of these evaluations indicate improvement in noise suppression with the proposed codebook-based modulation domain approach over competing approaches, particularly in cases of nonstationary noise.ii Sommaire Les méthodes conventionnelles de rehaussement de la paroleà canal unique utilisent une structure d'analyse-modification-synthèse (AMS) dans le domaine fréquentiel. Récemment, il aété démontré que l'utilisation de cette structure dans le domaine de la modulation pourrait offrir une meilleure suppression du bruit. Toutefois, cette conclusion aété obtenue en se basant sur une approcheà statistique minimale pour l'estimation de la densité spectrale de puissance (PSD) du bruit ambiant.Plusieurs algorithmes d'estimation de bruit ontété proposés au fil des ans dans la littérature sur le traitement de la parole. D'ordinaire, les méthodes d'estimation de bruit qui se servent d'une approcheà statistique minimale vont créer un décalage temporel dans l'estimation spectrale du bruit. Ce décalage peut engendrer beaucou...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.