Supervised Direct-Path Relative Transfer Function Learning for Binaural Sound Source Localization

Yang, Bing; Li, Xiaofei; Liu, Hong

doi:10.1109/icassp39728.2021.9413923

Cited by 5 publications

(2 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The DP-RTF learning based localization method takes full use of the spatial and spectral cues, which is demonstrated to perform better than several other methods on both simulated and real-world data in the noisy and reverberant environment. The proposed method is an extended version of our previous work [15], which has the following contributions.…”

Section: Introductionmentioning

confidence: 99%

Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization

Yang

Liu

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Direct-path relative transfer function (DP-RTF) refers to the ratio between the direct-path acoustic transfer functions of two microphone channels. Though DP-RTF fully encodes the sound spatial cues and serves as a reliable localization feature, it is often erroneously estimated in the presence of noise and reverberation. This paper proposes to learn DP-RTF with deep neural networks for robust binaural sound source localization. A DP-RTF learning network is designed to regress the binaural sensor signals to a real-valued representation of DP-RTF. It consists of a branched convolutional neural network module to separately extract the inter-channel magnitude and phase patterns, and a convolutional recurrent neural network module for joint feature learning. To better explore the speech spectra to aid the DP-RTF estimation, a monaural speech enhancement network is used to recover the direct-path spectrograms from the noisy ones. The enhanced spectrograms are stacked onto the noisy spectrograms to act as the input of the DP-RTF learning network. We train one unique DP-RTF learning network using many different binaural arrays to enable the generalization of DP-RTF learning across arrays. This way avoids time-consuming training data collection and network retraining for a new array, which is very useful in practical application. Experimental results on both simulated and real-world data show the effectiveness of the proposed method for direction of arrival (DOA) estimation in the noisy and reverberant environment, and a good generalization ability to unseen binaural arrays.

show abstract

Section: Introductionmentioning

confidence: 99%

Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization

Yang

Liu

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Recently, Deep Neural Networks (DNNs) has been used to learn the relationship between azimuth and binaural cues, by exploiting head movements to resolve the front-back ambiguity (Ma et al, 2017 ), and by combining spectral source models to robustly localize the target source in a multiple sources scenario (Ma et al, 2018 ). Additionally, a few works use DNNs to enhance the binaural features so that they can eliminate reverberation and additive noise (Pak and Shin, 2019 ; Yang et al, 2021 ). In Yalta et al ( 2017 ) and Vecchiotti et al ( 2019 ), the authors utilize DNNs to directly map the audio spectrogram or its raw waveform to the source azimuth in an end-to-end manner, which is also applicable to reverberant and noisy environments.…”

Section: Introductionmentioning

confidence: 99%

Toward learning robust contrastive embeddings for binaural sound source localization

Tang

Taseska

Waterschoot

2022

Front. Neuroinform.

View full text Add to dashboard Cite

Recent deep neural network based methods provide accurate binaural source localization performance. These data-driven models map measured binaural cues directly to source locations hence their performance highly depend on the training data distribution. In this paper, we propose a parametric embedding that maps the binaural cues to a low-dimensional space where localization can be done with a nearest-neighbor regression. We implement the embedding using a neural network, optimized to map points that are close to each other in the latent space (the space of source azimuths or elevations) to nearby points in the embedding space, thus the Euclidean distances between the embeddings reflect their source proximities, and the structure of the embeddings forms a manifold, which provides interpretability to the embeddings. We show that the proposed embedding generalizes well in various acoustic conditions (with reverberation) different from those encountered during training, and provides better performance than unsupervised embeddings previously used for binaural localization. In addition, the proposed method performs better than or equally well as a feed-forward neural network based model that directly estimates the source locations from the binaural cues, and it has better results than the feed-forward model when a small amount of training data is used. Moreover, we also compare the proposed embedding using both supervised and weakly supervised learning, and show that in both conditions, the resulting embeddings perform similarly well, but the weakly supervised embedding allows to estimate source azimuth and elevation simultaneously.

show abstract

SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound Source Localization

Yang

Liu

2022

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Supervised Direct-Path Relative Transfer Function Learning for Binaural Sound Source Localization

Cited by 5 publications

References 23 publications

Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization

Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization

Toward learning robust contrastive embeddings for binaural sound source localization

SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound Source Localization

Contact Info

Product

Resources

About