ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413923
|View full text |Cite
|
Sign up to set email alerts
|

Supervised Direct-Path Relative Transfer Function Learning for Binaural Sound Source Localization

Abstract: Direct-path relative transfer function (DP-RTF) refers to the ratio between the direct-path acoustic transfer functions of two channels. Though DP-RTF fully encodes the sound directional cues and serves as a reliable localization feature, it is often erroneously estimated in the presence of noise and reverberation. This paper proposes a supervised DP-RTF learning method with deep neural networks for robust binaural sound source localization. To exploit the complementarity of single-channel spectrogram and dual… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 23 publications
0
2
0
Order By: Relevance
“…The DP-RTF learning based localization method takes full use of the spatial and spectral cues, which is demonstrated to perform better than several other methods on both simulated and real-world data in the noisy and reverberant environment. The proposed method is an extended version of our previous work [15], which has the following contributions.…”
Section: Introductionmentioning
confidence: 99%
“…The DP-RTF learning based localization method takes full use of the spatial and spectral cues, which is demonstrated to perform better than several other methods on both simulated and real-world data in the noisy and reverberant environment. The proposed method is an extended version of our previous work [15], which has the following contributions.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, Deep Neural Networks (DNNs) has been used to learn the relationship between azimuth and binaural cues, by exploiting head movements to resolve the front-back ambiguity (Ma et al, 2017 ), and by combining spectral source models to robustly localize the target source in a multiple sources scenario (Ma et al, 2018 ). Additionally, a few works use DNNs to enhance the binaural features so that they can eliminate reverberation and additive noise (Pak and Shin, 2019 ; Yang et al, 2021 ). In Yalta et al ( 2017 ) and Vecchiotti et al ( 2019 ), the authors utilize DNNs to directly map the audio spectrogram or its raw waveform to the source azimuth in an end-to-end manner, which is also applicable to reverberant and noisy environments.…”
Section: Introductionmentioning
confidence: 99%