This paper addresses the determined blind source separation problem and proposes a new effective method unifying independent vector analysis (IVA) and nonnegative matrix factorization (NMF). IVA is a state-of-the-art technique that utilizes the statistical independence between sources in a mixture signal, and an efficient optimization scheme has been proposed for IVA. However, since the source model in IVA is based on a spherical multivariate distribution, IVA cannot utilize specific spectral structures such as the harmonic structures of pitched instrumental sounds. To solve this problem, we introduce NMF decomposition as the source model in IVA to capture the spectral structures. The formulation of the proposed method is derived from conventional multichannel NMF (MNMF), which reveals the relationship between MNMF and IVA. The proposed method can be optimized by the update rules of IVA and single-channel NMF. Experimental results show the efficacy of the proposed method compared with IVA and MNMF in terms of separation accuracy and convergence speed.Index Terms-Blind source separation, determined, independent vector analysis, nonnegative matrix factorization.
International audienceIn this paper, we report the results of the 2016 community-based Signal Separation Evaluation Campaign (SiSEC 2016). This edition comprises four tasks. Three focus on the separation of speech and music audio recordings, while one concerns biomedical signals. We summarize these tasks and the performance of the submitted systems, as well as provide a small discussion concerning future trends of SiSEC
This paper presents a deep neural network (DNN)-based phase reconstruction from amplitude spectrograms. In audio signal and speech processing, the amplitude spectrogram is often used for processing, and the corresponding phase spectrogram is reconstructed from the amplitude spectrogram on the basis of the Griffin-Lim method. However, the Griffin-Lim method causes unnatural artifacts in synthetic speech. Addressing this problem, we introduce the von-Mises-distribution DNN for phase reconstruction. The DNN is a generative model having the von Mises distribution that can model distributions of a periodic variable such as a phase, and the model parameters of the DNN are estimated on the basis of the maximum likelihood criterion. Furthermore, we propose a group-delay loss for DNN training to make the predicted group delay close to a natural group delay. The experimental results demonstrate that 1) the trained DNN can predict group delay accurately more than phases themselves, and 2) our phase reconstruction methods achieve better speech quality than the conventional Griffin-Lim method.
In this paper, we propose a new framework called independent deeply learned matrix analysis (IDLMA), which unifies a deep neural network (DNN) and independence-based multichannel audio source separation. IDLMA utilizes both pretrained DNN source models and statistical independence between sources for the separation, where the time-frequency structures of each source are iteratively optimized by a DNN while enhancing the estimation accuracy of the spatial demixing filters. As the source generative model, we introduce a complex heavy-tailed distribution to improve the separation performance. In addition, we address a semi-supervised situation; namely, a solo-recorded audio dataset can be prepared for only one source in the mixture signal. To solve the limited-data problem, we propose an appropriate data augmentation method to adapt the DNN source models to the observed signal, which enables IDLMA to work even in the semi-supervised situation. Experiments are conducted using music signals with a training dataset in both supervised and semi-supervised situations. The results show the validity of the proposed method in terms of the separation accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.