2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8462315
|View full text |Cite
|
Sign up to set email alerts
|

Spatial Audio Feature Discovery with Convolutional Neural Networks

Abstract: The advent of mixed reality consumer products brings about a pressing need to develop and improve spatial sound rendering techniques for a broad user base. Despite a large body of prior work, the precise nature and importance of various sound localization cues and how they should be personalized for an individual user to improve localization performance is still an open research problem. Here we propose training a convolutional neural network (CNN) to classify the elevation angle of spatially rendered sounds a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
24
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
2
2

Relationship

1
9

Authors

Journals

citations
Cited by 28 publications
(24 citation statements)
references
References 26 publications
(34 reference statements)
0
24
0
Order By: Relevance
“…Layerwise relevance propagation (LRP) [32] is a visualization technique which highlights the input features that are relevant for a given output. It can bring new information on the input features, for example in binaural localization where it has been used to identify the relevant elevation cues for a neural network, which can then be compared to human localization [33]. In addition, it is of paramount importance to check that the performance of the network is based on robust reasoning and not, for example, a bias in the dataset, which is made possible by LRP [34].…”
Section: Introductionmentioning
confidence: 99%
“…Layerwise relevance propagation (LRP) [32] is a visualization technique which highlights the input features that are relevant for a given output. It can bring new information on the input features, for example in binaural localization where it has been used to identify the relevant elevation cues for a neural network, which can then be compared to human localization [33]. In addition, it is of paramount importance to check that the performance of the network is based on robust reasoning and not, for example, a bias in the dataset, which is made possible by LRP [34].…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, alternative feature representations, e.g., the echo density profile or the phase spectrum [28] could be studied. Neural network explanatory techniques [29] may allow to identify the most suitable spectro-temporal signal representation for blind volume estimation.…”
Section: Discussionmentioning
confidence: 99%
“…While it works in the matched HRTF condition. In [20], a CNN-based sound localization method is proposed and proved to be robust to inter-subject and measurement variability, but this study only focuses on elevation localization. In [21], an end-to-end binaural sound localization approach is proposed, which estimates the azimuth directly from the waveform by CNN.…”
Section: Introductionmentioning
confidence: 99%