Absstroct-This paper focuses on the class of speech enhancement systems which capitalize on the major importance of the short-time spectral amplitude (STSA) of the speech signal in its perception. A system which utilizes a minimum mean-square error (MMSE) STSA estimator is proposed and then compared with other widely used systems which are based on Wiener filtering and the "spectral subtraction" algorithm. In this paper we derive the MMSE STSA estimator, based on modeling speech and noise spectral components as statistically independent Gaussian random variables. We analyze the performance of the proposed STSA estimator and compare it with a STSA estimator derived from the Wiener estimator. We also examine the MMSE STSA estimator under uncertainty of signal presence in the noisy observations. In constructing the enhanced signal, the MMSE STSA estimator is combined with the complex exponential of the noisy phase. It is shown here that the latter is the MMSE estimator of the complex exponential of the original phase, which does not affect the STSA estimation. The proposed approach results in a significant reduction of the noise, and provides enhanced speech with colorless residual noise. The complexity of the proposed algorithm is approximately that of other systems in the discussed class.
Speech enhancement algorithms which are based on estimating the short-time spectral amplitude of the clean speech have better performance when a soft-decision gain modification, depending on the a priori probability of speech absence, is used. In reported works a fixed probability, q, is assumed. Since speech is non-stationary and may not be present in every frequency bin when voiced, we propose a method for estimating distinct values of q for different bins which are tracked in time. The estimation is based on a decision-theoretic approach for setting a threshold in each bin followed by short-time averaging. The estimated q's are used to control both the gain and the update of the estimated noise spectrum during speech presence in a modified MMSE log-spectral amplitude estimator. Subjective tests resulted in higher scores than for the IS-127 standard enhancement algorithm, when pre-processing noisy speech for a coding application.
We propose a new algorithm for enhancing noisy speech which have been degraded by statistidy independent additive noise. The algorithm is based upon modeling the clean speech as a hidden Markov process with mixtures of Gaussian autoregressive (AR) output processes, and the noise process as a sequence of stationary, statistically independent, Gaussian AR vectors. The parameter sets of the models are estimated using training sequences from the clean speech and the noise process. The parameter set of the hidden Markov model is estimated by the segmental k-means algorithm. Given the estimated models, the enhancement of the noisy speech is done by alternate maximization of the likelihood function f the noisy speech, once over all sequences of states and mixture comnents assuming that the clean speech signal is given, and then over all ectors of the original speech using the resulting most probable sequence This alternating maximization is probable sequence of AR models algorithm, and then applying these AR models for constructing a sequence of Wiener filters which are used to enhance the noisy speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.