2020
DOI: 10.1186/s13636-020-0171-y
|View full text |Cite
|
Sign up to set email alerts
|

Binaural sound localization based on deep neural network and affinity propagation clustering in mismatched HRTF condition

Abstract: Binaural sound source localization is an important and widely used perceptually based method and it has been applied to machine learning studies by many researchers based on head-related transfer function (HRTF). Because the HRTF is closely related to human physiological structure, the HRTFs vary between individuals. Related machine learning studies to date tend to focus on binaural localization in reverberant or noisy environments, or in conditions with multiple simultaneously active sound sources. In contras… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
24
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 29 publications
(24 citation statements)
references
References 30 publications
0
24
0
Order By: Relevance
“…Building on Rumsey's spatial audio scene-based paradigm [3], complex spatial audio scenes can be described at the three following hierarchical levels: (1) low level of individual audio sources, (2) mid-level of ensembles of sources, and (3) high level of acoustical environments. However, the state-of-the-art computational models for binaural localization developed so far were intended to localize individual audio sources [4][5][6][7][8][9] rather than to characterize complex spatial audio scenes at various descriptive levels (see [10] for the review of the binaural localization models). They were designed using predominantly speech signals and were intended to localize speakers [7].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Building on Rumsey's spatial audio scene-based paradigm [3], complex spatial audio scenes can be described at the three following hierarchical levels: (1) low level of individual audio sources, (2) mid-level of ensembles of sources, and (3) high level of acoustical environments. However, the state-of-the-art computational models for binaural localization developed so far were intended to localize individual audio sources [4][5][6][7][8][9] rather than to characterize complex spatial audio scenes at various descriptive levels (see [10] for the review of the binaural localization models). They were designed using predominantly speech signals and were intended to localize speakers [7].…”
Section: Introductionmentioning
confidence: 99%
“…Likewise, there are only a few developments [13] aiming to characterize complex spatial audio scenes at the mid-or high level, using the hierarchical paradigm described above [3]. Moreover, most of the binaural localization models developed so far are constrained to 2D localization in the horizontal plane [4][5][6][7][8][9]. Some preliminary models allowing for full-sphere binaural sound source localization have been proposed, only recently [14,15].…”
Section: Introductionmentioning
confidence: 99%
“…The AoA estimation is performed by the input inter-channel phase differences from the deep neural network-based phase difference enhancement [ 27 ]. In the mismatched head-related transfer function condition, the data-efficient and clustering method based on deep neural network is provided to improve binaural localization performance [ 28 ].…”
Section: Introductionmentioning
confidence: 99%
“…Chan et al proposed a robotic sound localisation system using a WTA network to estimate the direction of a sound source through the ITD cues from a cochlea pair with an address event representation (AER) interface [27]. In recent years, deep neural networks (DNN) have provided more accurate estimations of sound source locations from binaural cues [29][30][31][32]. For example, in S. Jiang et al [32], simulated binaural signals were pre-processed with a Gammatone filter bank and used to train a DNN classifier for sound source localisation.…”
Section: Introductionmentioning
confidence: 99%