Abstract-Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially.
In the framework of the European HearCom project, promising signal enhancement algorithms were developed and evaluated for future use in hearing instruments. To assess the algorithms' performance, five of the algorithms were selected and implemented on a common real-time hardware/software platform. Four test centers in Belgium, The Netherlands, Germany, and Switzerland perceptually evaluated the algorithms. Listening tests were performed with large numbers of normal-hearing and hearing-impaired subjects. Three perceptual measures were used: speech reception threshold (SRT), listening effort scaling, and preference rating. Tests were carried out in two types of rooms. Speech was presented in multitalker babble arriving from one or three loudspeakers. In a pseudo-diffuse noise scenario, only one algorithm, the spatially preprocessed speech-distortion-weighted multi-channel Wiener filtering, provided a SRT improvement relative to the unprocessed condition. Despite the general lack of improvement in SRT, some algorithms were preferred over the unprocessed condition at all tested signal-to-noise ratios (SNRs). These effects were found across different subject groups and test sites. The listening effort scores were less consistent over test sites. For the algorithms that did not affect speech intelligibility, a reduction in listening effort was observed at 0 dB SNR.
A novel Bayesian matrix factorization method for bounded support data is presented. Each entry in the observation matrix is assumed to be beta distributed. As the beta distribution has two parameters, two parameter matrices can be obtained, which matrices contain only nonnegative values. In order to provide low-rank matrix factorization, the nonnegative matrix factorization (NMF) technique is applied. Furthermore, each entry in the factorized matrices, i.e., the basis and excitation matrices, is assigned with gamma prior. Therefore, we name this method as beta-gamma NMF (BG-NMF). Due to the integral expression of the gamma function, estimation of the posterior distribution in the BG-NMF model can not be presented by an analytically tractable solution. With the variational inference framework and the relative convexity property of the log-inverse-beta function, we propose a new lower-bound to approximate the objective function. With this new lower-bound, we derive an analytically tractable solution to approximately calculate the posterior distributions. Each of the approximated posterior distributions is also gamma distributed, which retains the conjugacy of the Bayesian estimation. In addition, a sparse BG-NMF can be obtained by including a sparseness constraint to the gamma prior. Evaluations with synthetic data and real life data demonstrate the good performance of the proposed method.
In this letter the focus is on linear filtering of speech before degradation due to additive background noise. The goal is to design the filter such that the speech intelligibility index (SII) is maximized when the speech is played back in a known noisy environment. Moreover, a power constraint is taken into account to prevent uncomfortable playback levels and deal with loudspeaker constraints. Previous methods use linear approximations of the SII in order to find a closed-form solution. However, as we show, these linear approximations introduce errors in low SNR regions and are therefore suboptimal. In this work we propose a nonlinear approximation of the SII which is accurate for all SNRs. Experiments show large intelligibility improvements with the proposed method over the unprocessed noisy speech and better performance than one state-of-the art method.Index Terms-Near-end enhancement, speech enhancement, speech intelligibility, speech intelligibility index.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.