This work presents a methodology for the joint calibration and synchronization of two arrays of microphones and loudspeakers. The problem is modeled as estimation of the rigid motion of one array with respect to the other, as well as estimation of the synchronization mismatch between the two. The proposed method uses dedicated signals emitted by the loudspeakers of the two arrays to compute a set of time of arrival (TOA) estimates. Through a simple transformation, estimated TOAs are converted into a set of linearly independent time difference of arrival (TDOA) measurements, which are modeled by a system of nonlinear equations in the unknown parameters of interest. A maximum likelihood estimate is then given as the solution to a nonlinear weighted least squares (NWLS) problem, which is optimized applying a parallelizable variant of Particle Swarm Optimization (PSO). In this paper, we also derive the Cramér-Rao lower bound (CRLB), and benchmark it against the proposed method in a series of Monte Carlo (MC) simulations. Results show that the proposed method attains high-performance comparable to the CRLB.
In this paper, we propose a directional signal extraction network (DSENet). DSENet is a low-latency, real-time model that, given a reverberant mixture of signals captured by a microphone array, aims at extracting the reverberant signal whose source is located within a predefined directional region with respect to the array. If there are multiple sources situated within the directional region of interest, DSENet will aim at extracting a linear combination of their reverberant signals. As such, the formulation of DSENet circumvents the well-known crosstalk problem in beamforming while providing an alternative and perhaps more practical approach to other spatially constrained signal extraction methods proposed in the literature. DSENet is based on a computationally efficient and low-distortion linear model formulated in the time domain. As a result, an important application of our work is hearing improvement on edge devices. Simulation results show that DSENet outperforms oracle beamformers, as well as state-of-the-art in lowlatency causal speech separation, while incurring a system latency of only 4 ms. Additionally, DSENet has been successfully deployed as a real-time application on a smartphone.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.