Abstract:Complex nonnegative matrix factorization (NMF) is a powerful tool for decomposing audio spectrograms while accounting for some phase information in the time-frequency domain. While its estimation was originally based on the Euclidean distance, in this paper we propose to extend it to any beta-divergence, a family of functions widely used in audio to estimate NMF. To this end, we introduce the beta-divergence in a heuristic fashion within a phase-aware probabilistic model. Estimating this model results in perfo… Show more
“…The term F(h) in Equation (16) as the function of the Gibbs distribution is essential for simplifying the adaptive optimization of λ. The maximum-likelihood (ML) estimation of λ can be decomposed as follows:…”
“…The complex nonnegative matrix factorization (CMF) spreads the NMF model by combining a sparsity representation with the complex-spectrum domain to improve the audio separability. The CMF can extract the recurrent patterns of the phase estimates and magnitude spectra of constituent signals [16][17][18]. Nevertheless, the CMF lacks the generalized mechanics used for controlling the sparseness of the code.…”
This paper proposes a solution for events classification from a sole noisy mixture that consist of two major steps: a sound-event separation and a sound-event classification. The traditional complex nonnegative matrix factorization (CMF) is extended by cooperation with the optimal adaptive L1 sparsity to decompose a noisy single-channel mixture. The proposed adaptive L1 sparsity CMF algorithm encodes the spectra pattern and estimates the phase of the original signals in time-frequency representation. Their features enhance the temporal decomposition process efficiently. The support vector machine (SVM) based one versus one (OvsO) strategy was applied with a mean supervector to categorize the demixed sound into the matching sound-event class. The first step of the multi-class MSVM method is to segment the separated signal into blocks by sliding demixed signals, then encoding the three features of each block. Mel frequency cepstral coefficients, short-time energy, and short-time zero-crossing rate are learned with multi sound-event classes by the SVM based OvsO method. The mean supervector is encoded from the obtained features. The proposed method has been evaluated with both separation and classification scenarios using real-world single recorded signals and compared with the state-of-the-art separation method. Experimental results confirmed that the proposed method outperformed the state-of-the-art methods.
“…The term F(h) in Equation (16) as the function of the Gibbs distribution is essential for simplifying the adaptive optimization of λ. The maximum-likelihood (ML) estimation of λ can be decomposed as follows:…”
“…The complex nonnegative matrix factorization (CMF) spreads the NMF model by combining a sparsity representation with the complex-spectrum domain to improve the audio separability. The CMF can extract the recurrent patterns of the phase estimates and magnitude spectra of constituent signals [16][17][18]. Nevertheless, the CMF lacks the generalized mechanics used for controlling the sparseness of the code.…”
This paper proposes a solution for events classification from a sole noisy mixture that consist of two major steps: a sound-event separation and a sound-event classification. The traditional complex nonnegative matrix factorization (CMF) is extended by cooperation with the optimal adaptive L1 sparsity to decompose a noisy single-channel mixture. The proposed adaptive L1 sparsity CMF algorithm encodes the spectra pattern and estimates the phase of the original signals in time-frequency representation. Their features enhance the temporal decomposition process efficiently. The support vector machine (SVM) based one versus one (OvsO) strategy was applied with a mean supervector to categorize the demixed sound into the matching sound-event class. The first step of the multi-class MSVM method is to segment the separated signal into blocks by sliding demixed signals, then encoding the three features of each block. Mel frequency cepstral coefficients, short-time energy, and short-time zero-crossing rate are learned with multi sound-event classes by the SVM based OvsO method. The mean supervector is encoded from the obtained features. The proposed method has been evaluated with both separation and classification scenarios using real-world single recorded signals and compared with the state-of-the-art separation method. Experimental results confirmed that the proposed method outperformed the state-of-the-art methods.
“…Many algorithms have been proposed to solve the above-mentioned problem. In this article, a complexvalued extension of NMF (complex NMF: CNMF) [18][19][20] and NMF based on complex generative models [3,[21][22][23] are reviewed.…”
Section: Problem In Nmf-based Modelingmentioning
confidence: 99%
“…Similar to NMF, the iterative update rules for CNMF can be derived by an auxiliary function technique [18]. In recent years, the similarity function in CNMF is generalized to -divergence, which includes the generalized KL divergence and IS divergence [19,20]. As described above, CNMF assumes the additivity of complex-valued spectrogram components and the low rank of the amplitude spectrogram, resulting in an appropriate decomposition model without ignoring phase information.…”
Section: Cnmf Employing Phase Spectramentioning
confidence: 99%
“…the sum of r.v.s can be modeled by the sum of the firstorder expectations (scale parameters) > 0. The complexvalued spectral components c ij;k are assumed to obey (20), and the scale parameters defined in each time-frequency slot correspond to the expectation of amplitude values as ij;k ¼ E½jc ij;k j. Since the complex Cauchy distribution has the stable property, the generative model of the observed spectrum…”
Nonnegative matrix factorization (NMF) is a powerful technique of extracting meaningful patterns from an observed matrix and has been used for many applications in the audio signal processing field. In this article, the principle of NMF and some extensions based on a complex generative model are reviewed. Also, their application to audio source separation is presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.