Relative group sparsity for non-negative matrix factorization with application to on-the-fly audio source separation

Badawy, Dalia El; Ozerov, Alexey; Duong, Ngoc Q. K.

doi:10.1109/icassp.2015.7177971

Cited by 6 publications

(8 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This article extends our preliminary work [24], [27] by providing the algorithms along with their mathematical derivations in addition to new results from a user test. Altogether, the main contributions of our proposed on-the-fly paradigm work are four-fold:…”

Section: Introductionmentioning

confidence: 97%

See 1 more Smart Citation

On-the-Fly Audio Source Separation—A Novel User-Friendly Framework

Badawy

Duong

Ozerov

2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Abstract-This article addresses the challenging problem of single-channel audio source separation. We introduce a novel user-guided framework where source models that govern the separation process are learned on-the-fly from audio examples retrieved online. The user only provides the search keywords that describe the sources in the mixture. In this framework, the generic spectral characteristics of each source are modeled by a universal sound class model learned from the retrieved examples via non-negative matrix factorization. We propose several group sparsity-inducing constraints in order to efficiently exploit a relevant subset of the universal model adapted to the mixture to be separated. We then derive the corresponding multiplicative update rules for parameter estimation. Separation results obtained from automated and user tests on mixtures containing various types of sounds confirm the effectiveness of the proposed framework.Index Terms-On-the-fly audio source separation, user-guided, non-negative matrix factorization, group sparsity, universal sound class model.

show abstract

Section: Introductionmentioning

confidence: 97%

“…prevents them from vanishing entirely). In other words, the group sparsity property is now considered relative to the corresponding supergroup H (j) and not within the full set of coefficients in H. It is formulated as [27] …”

Section: Group Sparsitymentioning

confidence: 99%

On-the-Fly Audio Source Separation—A Novel User-Friendly Framework

Badawy

Duong

Ozerov

2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The term universal model is also in analogy to the universal background models for speaker verification addressed in [10]. This idea of using a generic spectral model was then exploited in the context of on-the-fly source separation [11,12] where any kind of audio sources can be separated with the guidance from its examples collected from a search engine. Motivated from those above-mentioned works, we propose in this paper to learn two generic spectral models for speech and background noise independently in advance.…”

Section: Introductionmentioning

confidence: 99%

“…Firstly, compared to [8] where only the universal speech model was pre-learned and noise model was adapted during the separation process, we consider to learn the universal noise model also since noisy examples can be easily collected in advance and it would potentially improve the separation quality. Secondly, compared to [8] and [11,12] where either block sparsity-inducing penalty or component-sparsity-inducing penalty was used, we propose in this paper a combination of these two penalties which would offer better estimating the parameters in the model fitting.…”

Section: Introductionmentioning

confidence: 99%

Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint

Duong

Nguyen

et al. 2015

Proceedings of the Sixth International Symposium on Information and Communication Technology

Self Cite

View full text Add to dashboard Cite

International audienceThis paper addresses a challenging single-channel speech enhancement problem in real-world environment where speech signal is corrupted by high level background noise. While most state-of-the-art algorithms tries to estimate noise spectral power and filter it from the observed one to obtain enhanced speech, the paper discloses another approach inspired from audio source separation technique. In the considered method, generic spectral characteristics of speech and noise are first learned from various training signals by non-negative matrix factorization (NMF). They are then used to guide the similar factorization of the observed power spectrogram into speech part and noise part. Additionally, we propose to combine two existing group sparsity-inducing penalties in the optimization process and adapt the corresponding algorithm for parameter estimation based on mul-tiplicative update (MU) rule. Experiment results over different settings confirm the effectiveness of the proposed approach

show abstract

“…Furthermore, some modified group sparsity constraints have been proposed to improve the performance. For example, Badawy proposed relative group sparsity [13] to prevent the activations corresponding to one universal source model from vanishing altogether. Hurmalainen introduced a quadratic penalty function into group sparsity that permits dynamic relationships between basis vectors or groups, since the basic form of group sparsity assumes the independence of different groups without considering which groups will activate, alone or together [14].…”

Section: Introductionmentioning

confidence: 99%

Adaptive Group Sparsity for Non-Negative Matrix Factorization with Application to Unsupervised Source Separation

Liu

Wang

et al. 2016

Interspeech 2016

View full text Add to dashboard Cite

Non-negative matrix factorization (NMF) is an appealing technique for many audio applications, such as automatic music transcription, source separation and speech enhancement. Sparsity constraints are commonly used on the NMF model to discover a small number of dominant patterns. Recently, group sparsity has been proposed for NMF based methods, in which basis vectors belonging to a same group are permitted to activate together, while activations across groups are suppressed. However, most group sparsity models penalize all groups using a same parameter without considering the relative importance of different groups for modeling the input data. In this paper, we propose adaptive group sparsity to model the relative importance of different groups with adaptive penalty parameters and investigate its potential benefit to separate speech from other sound sources. Experimental results show that the proposed adaptive group sparsity improves the performance over regular group sparsity in unsupervised settings where neither the speaker identity nor the type of noise is known in advance.

show abstract

Relative group sparsity for non-negative matrix factorization with application to on-the-fly audio source separation

Cited by 6 publications

References 17 publications

On-the-Fly Audio Source Separation—A Novel User-Friendly Framework

On-the-Fly Audio Source Separation—A Novel User-Friendly Framework

Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint

Adaptive Group Sparsity for Non-Negative Matrix Factorization with Application to Unsupervised Source Separation

Contact Info

Product

Resources

About