International audienceThis paper addresses a challenging single-channel speech enhancement problem in real-world environment where speech signal is corrupted by high level background noise. While most state-of-the-art algorithms tries to estimate noise spectral power and filter it from the observed one to obtain enhanced speech, the paper discloses another approach inspired from audio source separation technique. In the considered method, generic spectral characteristics of speech and noise are first learned from various training signals by non-negative matrix factorization (NMF). They are then used to guide the similar factorization of the observed power spectrogram into speech part and noise part. Additionally, we propose to combine two existing group sparsity-inducing penalties in the optimization process and adapt the corresponding algorithm for parameter estimation based on mul-tiplicative update (MU) rule. Experiment results over different settings confirm the effectiveness of the proposed approach
This paper considers the single-channel speech separation problem given a noisy observation recorded by a microphone. More precisely, we focus on the speaker-dependent approach where spectral characteristic of target speech is learned in advance from a clean example. In training process, we propose to learn a generic spectral model for noise source by collecting various types of environmental noise via the established non-negative matrix factorization framework. In speech enhancement process, we propose to combine two existing group sparsity-inducing penalties in the optimization function and derive the corresponding algorithm for parameter estimation based on multiplicative update (MU) rule. Experiment result over mixtures containing different real-world noises confirms the effectiveness of our approach.
Audio fingerprinting, also named as audio hashing, has been well-known as a powerful technique to perform audio identification and synchronization. It basically involves two major steps: fingerprint (voice pattern) design and matching search. While the first step concerns the derivation of a robust and compact audio signature, the second step usually requires knowledge about database and quick-search algorithms. Though this technique offers a wide range of real-world applications, to the best of the authors' knowledge, a comprehensive survey of existing algorithms appeared more than eight years ago. Thus, in this paper, we present a more up-to-date review and, for emphasizing on the audio signal processing aspect, we focus our state-of-the-art survey on the fingerprint design step for which various audio features and their tractable statistical models are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.