ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683732
|View full text |Cite
|
Sign up to set email alerts
|

End-to-end Binaural Sound Localisation from the Raw Waveform

Abstract: A novel end-to-end binaural sound localisation approach is proposed which estimates the azimuth of a sound source directly from the waveform. Instead of employing hand-crafted features commonly employed for binaural sound localisation, such as the interaural time and level difference, our end-to-end system approach uses a convolutional neural network (CNN) to extract specific features from the waveform that are suitable for localisation. Two systems are proposed which differ in the initial frequency analysis s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
49
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 57 publications
(49 citation statements)
references
References 22 publications
0
49
0
Order By: Relevance
“…When the proposed WM softmax loss is replaced with a regular softmax loss, the output of the proposed network will not be the DP-RTF feature anymore. Instead, the network will output the class of source direction, which is the same as many deep-classification-based sound source localization methods, such as [5,25]. With 5 • error tolerance, the MSE loss slightly outperforms the softmax loss in most cases, but the softmax loss performs relatively better with 0 • error tolerance.…”
Section: Resultsmentioning
confidence: 99%
“…When the proposed WM softmax loss is replaced with a regular softmax loss, the output of the proposed network will not be the DP-RTF feature anymore. Instead, the network will output the class of source direction, which is the same as many deep-classification-based sound source localization methods, such as [5,25]. With 5 • error tolerance, the MSE loss slightly outperforms the softmax loss in most cases, but the softmax loss performs relatively better with 0 • error tolerance.…”
Section: Resultsmentioning
confidence: 99%
“…It can be seen that localization accuracies between different HRTFs are different, and the accuracies on the diagonal line correspond to the matched HRTF condition and all reach to 100%. Besides the diagonal line, the localization accuracy at (21,12) is also 100%, which indicates that the similarity between the HRTF 12 and HRTF 21 is very high and the DNNs trained by them respectively can be substituted for each other.…”
Section: Localization Similarity Between Hrtfsmentioning
confidence: 97%
“…In [20], a CNN-based sound localization method is proposed and proved to be robust to inter-subject and measurement variability, but this study only focuses on elevation localization. In [21], an end-to-end binaural sound localization approach is proposed, which estimates the azimuth directly from the waveform by CNN. This approach is robust to the reverberate condition; however, the performance in the HRTF-mismatched condition is not studied.…”
Section: Introductionmentioning
confidence: 99%
“…Further, binaural detection is a highly specialised auditory function for which deficits have real-world consequences 25,26 . DNNs may offer the opportunity to bridge this gap between animal and human data, and as yet, the inner workings of DNNs constructed to handle binaural audio have scarcely been considered [27][28][29] .…”
Section: Introductionmentioning
confidence: 99%