In this paper, the problem of single microphone source separation via Nonnegative Matrix Factorization (NMF) by exploiting video information is addressed. Respective audio and video modalities coming from a single human speech usually have similar time changes. It means that changes in one of them usually corresponds to changes in the other one. So it is expected that activation coefficient matrices of their NMF decomposition are similar. Based on this similarity, in this paper the activation coefficient matrix of the video modality is used as an initialization for audio source separation via NMF. In addition, the mentioned similarity is used for post-processing and for clustering the rows of the activation coefficient matrix which were resulted from randomly initialized NMF. Simulation results confirm the effectiveness of the proposed multimodal approaches in single microphone source separation.
International audienceIn this paper, the problem of convolutive source separation via multimodal soft Nonnegative Matrix Co-Factorization (NMCF) is addressed. Different aspects of a phenomenon may be recorded by sensors of different types (e.g., audio and video of human speech), and each of these recorded signals is called a modality. Since the underlying phenomenon of the modalities is the same, they have some similarities. Especially, they usually have similar time changes. It means that changes in one of them usually correspond to changes in the other one. So their active or inactive periods are usually similar. Assuming this similarity, it is expected that the activation coefficient matrices of their Nonnegative Matrix Factorization (NMF) have a similar form. In this paper, the similarity of the activation coefficient matrices between the modalities is considered for co-factorization. This similarity is used for separation procedure in a soft manner by using penalty terms. This results in more flexibility in the separation procedure. Simulation results and comparison with state-of-the-art algorithms show the effectiveness of the proposed algorithm
Tensor Completion is an important problem in big data processing. Usually, data acquired from different aspects of a multimodal phenomenon or different sensors are incomplete due to different reasons such as noise, low sampling rate or human mistake. In this situation, recovering the missing or uncertain elements of the incomplete dataset is an important step for efficient data processing. In this paper, a new completion approach using Tensor Ring (TR) decomposition in the embedded space has been proposed. In the proposed approach, the incomplete data tensor is first transformed into a higher order tensor using the block Hankelization method. Then the higher order tensor is completed using TR decomposition with rank incremental and multistage strategy. Simulation results show the effectiveness of the proposed approach compared to the state of the art completion algorithms, especially for very high missing ratios and noisy cases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.