2018
DOI: 10.1109/taslp.2018.2855960
|View full text |Cite
|
Sign up to set email alerts
|

Robust Binaural Localization of a Target Sound Source by Combining Spectral Source Models and Deep Neural Networks

Abstract: Despite there being clear evidence for top-down (e.g., attentional) effects in biological spatial hearing, relatively few machine hearing systems exploit top-down model-based knowledge in sound localisation. This paper addresses this issue by proposing a novel framework for binaural sound localisation that combines model-based information about the spectral characteristics of sound sources and deep neural networks (DNNs). A target source model and a background source model are first estimated during a training… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
55
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 54 publications
(55 citation statements)
references
References 28 publications
0
55
0
Order By: Relevance
“…Besides, binaural models are apt to make use of the spatial localization information to address the Cocktail Party problem. For instance, Ma et al (2018) train DNNs to localize acoustic features in full 360 • azimuth angles. After the training phase, the binaural localization with spectral features is used as prior knowledge in the top-down modulation of the model.…”
Section: Computational Models For the Human Cocktail Party Problem Somentioning
confidence: 99%
“…Besides, binaural models are apt to make use of the spatial localization information to address the Cocktail Party problem. For instance, Ma et al (2018) train DNNs to localize acoustic features in full 360 • azimuth angles. After the training phase, the binaural localization with spectral features is used as prior knowledge in the top-down modulation of the model.…”
Section: Computational Models For the Human Cocktail Party Problem Somentioning
confidence: 99%
“…According to Rumsey's spatial audio scene-based paradigm [5], complex spatial audio scenes could be described using perceptual attributes describing (1) individual audio sources, (2) ensembles of sources, and (3) acoustic environments. However, most of the models, mimicking spatial hearing in humans, are limited to the localization of individual audio sources [6][7][8][9][10], ignoring higher-level attributes describing complex spatial audio scenes.…”
Section: Introductionmentioning
confidence: 99%
“…Early machine-listening algorithms, inspired by spatial hearing in humans, could be characterized as single-source localization models since they were capable of localization of only one audio source at a time [11,12]. More recently, multiple-source localization models have been developed [6][7][8][9], which constitutes an important step towards quantification of higher-level attributes (e.g., ensemble width), ultimately leading to a holistic characterization of complex spatial audio scenes. While their reported accuracy is deemed to be good, their applicability is limited, as they often require a priori knowledge about the number of sources of interest and their signal characteristics.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Later, Ma et al [Ma, May and Brown (2017)] used DNN and head rotation to achieve multi-sound source localization under reverberation conditions. Ma et al [Ma, Gonzalez, Brown et al (2018)] also combined sound spectrum and DNN in the time-frequency (TF) unit to estimate the azimuth. Yiwere et al [Yiwere and Rhee (2017)] used ILD and CCF as input features to train DNN models.…”
Section: Introductionmentioning
confidence: 99%