A new and effective algorithm is proposed in this paper based on Gaussian Mixture Modelling (GMM) and Minimum Mean Square Error (MMSE) criterion for speech enhancement where no assumption is made on the nature or stationarity of the noise. No Voice Activity Detection (VAD) or any other means is used to estimate the input Signal to Noise Ratio (SNR). The mean vectors of the mixture models of spectral magnitudes derived from models of speech and different noise sources power spectra are used to form sets of over-determined system of equations, as many as noise source candidates, whose solutions lead to the MMSE estimations of speech and additive noise spectral magnitudes. The corresponding power spectra are then used for noise suppression by applying Wiener filtering carried out on overlapping frames. The input SNR is estimated and the nature of the noise involved is determined as by-products of the method used. Results are compared with codebook constrained methods that have shown very good results but suffer from long processing times. It is shown that, at the cost of a slight lower improvement in SNR and PESQ score, the new algorithm reduces the computation time to one fifth which makes it suitable for practical applications. (Abstract)
In this paper a new iterative method of speech enhancement using Power Spectral Density (PSD) codebooks of clean speech and several types of noise, is proposed. The proposed algorithm estimates the PSDs of speech and noise of unknown nature and, evaluates the input Signal-to-Noise Ratio (SNR) by solving an over-determined set of equations. No Voice Activity Detection (V AD) or other means of noise spectral estimation such as minimum statistics is used. The pre-calculated codebooks are tree structured for the sake of speed of processing. The Wiener filter is used in the first instance because of its simplicity. A new variant of Parametric Wiener filter whose parameters are controlled by the skewness and kurtosis of the estimated clean speech and noise is also used to further suppress the noise. The results of employing these iterative algorithms are reported and compared for enhancement of noisy speech of different noise types and different input SNRs. Keywords-iterative and parametric Wiener filters, PSD codebook, tree-structured code book, noise estimation, skewness and kurtosis I. I NTRODUCTIONIn real environments, the presence of interfering noises always greatly degrades the performance of speech communication systems. Some techniques have been developed to solve the problem over the past decades including, for instance, spectral subtraction, Wiener filtering and all-pole modelling non-causal Wiener filtering [1]. Most of these techniques are mainly under the assumption that the interfering signal is stationary, additive and non speech-like. Since the needed statistics of the noise can only be estimated during speech pauses a V AD is needed in the single-channel approaches where the noisy observation is only available. Alternatively noise estimation based on minimum statistics can be used. However, a poor performance is achieved when interference is time-varying and also speech-like.Iterative speech enhancement algorithms perform better at the cost of an increase in complexity. In [2], Lim and Oppenheim proposed the iterative Wiener filtering (lWF) technique for speech enhancement where the estimation of the all-pole parameters of speech in additive white Gaussian noise was posed as a two-step sequential Maximum A-Posteriori (MAP) estimation problem. In [3], Hansen and Clementsshowed that constraints in the parameter estimation are essential in order to retain speech-like characteristics of enhanced speech. In [4], a clustering based approach namely the codebook constrained iterative Wiener filtering scheme was proposed as an alternative method of imposing constraints. Here, the all-pole parameters are constrained to belong to a codebook of clean speech vectors. Apart from successfully defining a convergence criterion, this approach was quite effective in taking care of several types of speech constraints such as those between the formants and those due to speaker variability.In all the above approaches only stationary noise is considered. However, in many practical applications the noise is time-varying and ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.