This paper describes a computationally-efficient blind source separation (BSS) method based on the independence, lowrankness, and directivity of the sources. A typical approach to BSS is unsupervised learning of a probabilistic model that consists of a source model representing the time-frequency structure of source images and a spatial model representing their interchannel covariance structure. Building upon the low-rank source model based on nonnegative matrix factorization (NMF), which has been considered to be effective for inter-frequency source alignment, multichannel NMF (MNMF) assumes source images to follow multivariate complex Gaussian distributions with unconstrained full-rank spatial covariance matrices (SCMs). An effective way of reducing the computational cost and initialization sensitivity of MNMF is to restrict the degree of freedom of SCMs. While a variant of MNMF called independent low-rank matrix analysis (ILRMA) severely restricts SCMs to rank-1 matrices under an idealized condition that only directional and lessechoic sources exist, we restrict SCMs to jointly-diagonalizable yet full-rank matrices in a frequency-wise manner, resulting in FastMNMF1. To help inter-frequency source alignment, we then propose FastMNMF2 that shares the directional feature of each source over all frequency bins. To explicitly consider the directivity or diffuseness of each source, we also propose rankconstrained FastMNMF that enables us to individually specify the ranks of SCMs. Our experiments showed the superiority of FastMNMF over MNMF and ILRMA in speech separation and the effectiveness of the rank constraint in speech enhancement.
This paper describes a versatile method that accelerates multichannel source separation methods based on full-rank spatial modeling. A popular approach to multichannel source separation is to integrate a spatial model with a source model for estimating the spatial covariance matrices (SCMs) and power spectral densities (PSDs) of each sound source in the time-frequency domain. One of the most successful examples of this approach is multichannel nonnegative matrix factorization (MNMF) based on a full-rank spatial model and a low-rank source model. MNMF, however, is computationally expensive and often works poorly due to the difficulty of estimating the unconstrained full-rank SCMs. Instead of restricting the SCMs to rank-1 matrices with the severe loss of the spatial modeling ability as in independent low-rank matrix analysis (ILRMA), we restrict the SCMs of each frequency bin to jointly-diagonalizable but still full-rank matrices. For such a fast version of MNMF, we propose a computationally-efficient and convergence-guaranteed algorithm that is similar in form to that of ILRMA. Similarly, we propose a fast version of a stateof-the-art speech enhancement method based on a deep speech model and a low-rank noise model. Experimental results showed that the fast versions of MNMF and the deep speech enhancement method were several times faster and performed even better than the original versions of those methods, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.