2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7952151
|View full text |Cite
|
Sign up to set email alerts
|

Hearing in a shoe-box: Binaural source position and wall absorption estimation using virtually supervised learning

Abstract: This paper introduces a new framework for supervised sound source localization referred to as virtually-supervised learning. An acoustic shoe-box room simulator is used to generate a large number of binaural single-source audio scenes. These scenes are used to build a dataset of spatial binaural features annotated with acoustic properties such as the 3D source position and the walls' absorption coefficients. A probabilistic high-to low-dimensional regression framework is used to learn a mapping from these feat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
1
1

Relationship

4
1

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 17 publications
0
9
0
Order By: Relevance
“…This model was notably showed to realistically account for sound scattering due to the presence of objects, by comparing simulated RIRs with measured ones in [22]. The study [8] suggests that such diffusion effects play an important role in sound source localization. VAST KEMAR 0 contains over 110, 000 RIR, which required about 700 CPU-hours of computation.…”
Section: Room Simulation and Data Generationmentioning
confidence: 98%
“…This model was notably showed to realistically account for sound scattering due to the presence of objects, by comparing simulated RIRs with measured ones in [22]. The study [8] suggests that such diffusion effects play an important role in sound source localization. VAST KEMAR 0 contains over 110, 000 RIR, which required about 700 CPU-hours of computation.…”
Section: Room Simulation and Data Generationmentioning
confidence: 98%
“…Finally, the integration with the VAST toolbox [16] allows the user to easily generate arbitrarily large datasets of BRIRs. The provided MATLAB scripts allow to: (i) initialize an empty VAST structure, (ii) define the room acoustic conditions and (iii) automatically populate the dataset with metadata while calling SofaMyRoom to generate and store BRIRs.…”
Section: Functionalitiesmentioning
confidence: 99%
“…Data-driven and machine-hearing systems are becoming a key component in audio signal processing research, but they require large amount of labelled data in order to be deployed [16]. On the other hand, the process of recording such data from a real environment involves manual operations that are time consuming and error-prone.…”
Section: Impactmentioning
confidence: 99%
“…For the final evaluation, we use standard metrics like Equal Error Rate (EER) and Minimum Decision Cost Function (minDCF) at target prior p = 0.05 (NIST SRE18 VAST operating point). The Code for this work is available online 1 and a parent paper is submitted in parallel [22].…”
Section: Evaluation Detailsmentioning
confidence: 99%
“…Various phenomena degrades speech such as noise, reverberation, speaker movement, device orientation, and room characteristics [1]. This makes the deployment of Speaker Verification (SV) systems challenging.…”
Section: Introductionmentioning
confidence: 99%