We propose a novel algorithm for adaptive blind audio source extraction. The proposed method is based on independent vector analysis and utilizes the auxiliary function optimization to achieve high convergence speed. The algorithm is partially supervised by a pilot signal related to the source of interest (SOI), which ensures that the method correctly extracts the utterance of the desired speaker. The pilot is based on the identification of a dominant speaker in the mixture using x-vectors. The properties of the x-vectors computed in the presence of cross-talk are experimentally analyzed. The proposed approach is verified in a scenario with a moving SOI, static interfering speaker and environmental noise.
In this paper, we propose a novel algorithm for blind source extraction (BSE) of a moving acoustic source recorded by multiple microphones. The algorithm is based on independent vector extraction (IVE) where the contrast function is optimized using the auxiliary function-based technique and where the recently proposed constant separating vector (CSV) mixing model is assumed. CSV allows for movements of the extracted source within the analyzed batch of recordings. We provide a practical explanation of how the CSV model works when extracting a moving acoustic source. Then, the proposed algorithm is experimentally verified on the task of blind extraction of a moving speaker. The algorithm is compared with state-of-the-art blind methods and with an adaptive BSE algorithm which processes data in a sequential manner. The results confirm that the proposed algorithm can extract the moving speaker better than the BSE methods based on the conventional mixing model and that it achieves improved extraction accuracy than the adaptive method.
We propose a novel approach for semi-supervised extraction of a moving audio source of interest (SOI) applicable in reverberant and noisy environments. The blind part of the method is based on independent vector extraction (IVE) and uses the recently proposed constant separating vector (CSV) mixing model. This model allows for changes of mixing parameters within the processed interval of the mixture, which potentially leads to higher accuracy of SOI estimation. The supervised part of the method concerns a pilot signal, which is related to the SOI and ensures the convergence of the blind method towards the SOI. The pilot is based on robust detection of frames where SOI is dominant via speaker embeddings called X-vectors. Robustness of the detection is achieved through augmentation of the data for the supervised training of the X-vectors. The pilot-supported extraction yields significantly better performance compared to its unsupervised counterpart identifying SOI solely using the initialization.
Independent Vector Extraction (IVE) is a modification of Independent Vector Analysis (IVA) for Blind Source Extraction (BSE) to a setup in which only one source of interest (SOI) should be separated from a mixture of signals observed by microphones. The fundamental assumption is that the SOI is independent of the other signals. IVE shows reasonable results; however, its basic variant is limited to static sources. To extract a moving source, IVE has recently been extended by considering the Constant Separating Vector (CSV) mixing model. It enables us to estimate a separating filter that extracts the SOI from a wider spatial area through which the source has moved. However, only slow gradient-based algorithms were proposed in the pioneering papers on IVE and CSV. In this paper, we experimentally verify the applicability of the CSV mixing model and propose new IVE methods derived by modifying the auxiliary function-based algorithm for IVA. Piloted Variants are proposed as well for the methods with partially controllable global convergence. The methods are verified under reverberant and noisy conditions using model-based as well as real-world acoustic impulse responses. They are also verified within the CHiME-4 speech separation and recognition challenge. The experiments corroborate the applicability of the CSV mixing model for the blind moving source extraction as well as the improved convergence of the proposed algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.