Reverberant sound localization with a robot head based on direct-path relative transfer function

Li, Xiaofei; Girin, Laurent; Badeig, Fabien; Horaud, Radu

doi:10.1109/iros.2016.7759437

Cited by 26 publications

(20 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…at the beginning of Speaker 3's trajectory (in blue). The possible reasons are i) the NAO robot (v5) has a relative strong egonoise [38], and thus the signal-to-noise ratio of the recorded signals is relative low, and ii) the speakers are moving with a varying source-to-robot distance and the direct-path speech is contaminated by more reverberations when the speakers are distant. Overall, DPRTF-REM and DPRTF-EG are able to monitor the moving, appearance, and disappearance of active speakers for most of the time, with a small time lag due to the temporal smoothing.…”

Section: B Results For Locata Datasetmentioning

confidence: 99%

Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

Li¹,

Ban²,

Girin³

et al. 2019

IEEE J. Sel. Top. Signal Process.

Self Cite

View full text Add to dashboard Cite

We address the problem of online localization and tracking of multiple moving speakers in reverberant environments. The paper has the following contributions. We use the direct-path relative transfer function (DP-RTF), an interchannel feature that encodes acoustic information robust against reverberation, and we propose an online algorithm well suited for estimating DP-RTFs associated with moving audio sources. Another crucial ingredient of the proposed method is its ability to properly assign DP-RTFs to audio-source directions. Towards this goal, we adopt a maximum-likelihood formulation and we propose to use exponentiated gradient (EG) to efficiently update source-direction estimates starting from their currently available values. The problem of multiple speaker tracking is computationally intractable because the number of possible associations between observed source directions and physical speakers grows exponentially with time. We adopt a Bayesian framework and we propose a variational approximation of the posterior filtering distribution associated with multiple speaker tracking, as well as an efficient variational expectation maximization (VEM) solver. The proposed online localization and tracking method is thoroughly evaluated using two datasets that contain recordings performed in real environments.

show abstract

Section: B Results For Locata Datasetmentioning

confidence: 99%

Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

Li¹,

Ban²,

Girin³

et al. 2019

IEEE J. Sel. Top. Signal Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Hospedales et al [16] proposed a Bayesian model-based audio-visual fusion framework to segment, associate, and track multiple objects in audiovisual sequences. Li et al presented an SSL-based HRI system in [17]. They calibrated the sound sources' corresponding pixel coordinates.…”

Section: B Audio-visual Fusion Methodsmentioning

confidence: 99%

“…The audio-visual fusion works [17], [18], [19], all use static robots to track the observer. For the moving robot, in [20], Evers et al proposed an acoustic SLAM framework that is different from the general concept of SLAM.…”

Section: B Audio-visual Fusion Methodsmentioning

confidence: 99%

AcousticFusion: Fusing Sound Source Localization to Visual SLAM in Dynamic Environments

Zhang

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Dynamic objects in the environment, such as people and other agents, lead to challenges for existing simultaneous localization and mapping (SLAM) approaches. To deal with dynamic environments, computer vision researchers usually apply some learning-based object detectors to remove these dynamic objects. However, these object detectors are computationally too expensive for mobile robot on-board processing. In practical applications, these objects output noisy sounds that can be effectively detected by on-board sound source localization. The directional information of the sound source object can be efficiently obtained by direction of sound arrival (DoA) estimation, but the depth estimation is difficult. Therefore, in this paper, we propose a novel audio-visual fusion approach that fuses sound source direction into the RGB-D image and thus removes the effect of dynamic obstacles on the multi-robot SLAM system. Experimental results of multirobot SLAM in different dynamic environments show that the proposed method uses very small computational resources to obtain very stable self-localization results.

show abstract

“…Experiments with real data are conducted using a version 5 NAO robot whose head has four microphones in a horizontal plane [22]. Thence we only perform 360 k is computed using the HRTF of NAO.…”

Section: Methodsmentioning

confidence: 99%

Online Localization of Multiple Moving Speakers in Reverberant Environments

Li¹,

Mourgue²,

Girin³

et al. 2018

2018 IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM)

Self Cite

View full text Add to dashboard Cite

This paper addresses the problem of online multiple moving speakers localization in reverberant environments. The direct-path relative transfer function (DP-RTF), as defined by the ratio between the first taps of the convolutive transfer function (CTF) of two microphones, encodes the inter-channel direct-path information and is thus used as a localization feature being robust against reverberation. The CTF estimation is based on the cross-relation method. In this work, the recursive least-square method is proposed to solve the cross-relation problem, due to its relatively low computational cost and its good convergence rate. The DP-RTF feature estimated at each time-frequency bin is assumed to correspond to a single speaker. A complex Gaussian mixture model is used to assign each observed feature to one among several speakers. The recursive expectation-maximization algorithm is adopted to update online the model parameters. The method is evaluated with a new dataset containing multiple moving speakers, where the ground-truth speaker trajectories are recorded with a motion capture system.

show abstract

Reverberant sound localization with a robot head based on direct-path relative transfer function

Cited by 26 publications

References 33 publications

Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

AcousticFusion: Fusing Sound Source Localization to Visual SLAM in Dynamic Environments

Online Localization of Multiple Moving Speakers in Reverberant Environments

Contact Info

Product

Resources

About