ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414415
|View full text |Cite
|
Sign up to set email alerts
|

Synthetic Data For Dnn-Based Doa Estimation of Indoor Speech

Abstract: This paper investigates the use of different room impulse response (RIR) simulation methods for synthesizing training data for deep neural network-based direction of arrival (DOA) estimation of speech in reverberant rooms.Different sets of synthetic RIRs are obtained using the image source method (ISM) and more advanced methods including diffuse reflections and/or source directivity. Multi-layer perceptron (MLP) deep neural network (DNN) models are trained on generalized cross correlation (GCC) features extrac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 16 publications
0
5
0
Order By: Relevance
“…We relied on the DNS Challenge 2021 speech and noise data, as it is a high quality database that covers multiple languages and many different types of noises. For the RIRs, we used the ISM-dir dataset described in [30]. These RIRs are simulated using the image source method with the addition that all speaker sources are modelled as directive sources with an average male/female speaker pattern directivity.…”
Section: Training Datamentioning
confidence: 99%
See 1 more Smart Citation
“…We relied on the DNS Challenge 2021 speech and noise data, as it is a high quality database that covers multiple languages and many different types of noises. For the RIRs, we used the ISM-dir dataset described in [30]. These RIRs are simulated using the image source method with the addition that all speaker sources are modelled as directive sources with an average male/female speaker pattern directivity.…”
Section: Training Datamentioning
confidence: 99%
“…RIRs were then recorded with the same microphone array in the same room, at various speaker positions and orientations. More details on how these RIRs were obtained can be found in [30]. We included both the RIR recordings for speakers facing the array, and the RIRs for speakers facing away at a 90 degree angle.…”
Section: Evaluation a Evaluation Datamentioning
confidence: 99%
“…We relied on the DNS Challenge 2021 speech and noise data, as it is a high quality database that covers multiple languages and many different types of noises. For the RIRs, we used the ISM-dir dataset described in [27]. These RIRs are simulated using the image source method with the addition that all speaker sources are modelled as directive sources with an average male/female speaker pattern directivity.…”
Section: Training Datamentioning
confidence: 99%
“…RIRs were then recorded with the same microphone array in the same room, at various speaker positions and orientations. More details on how these RIRs were obtained can be found in [27]. We included both the RIR recordings for speakers looking towards the array, and the RIRs for speakers looking away at a 90 degree angle.…”
Section: Evaluation a Evaluation Datamentioning
confidence: 99%
“…This method shows similar performance compared to the usual ISM, while being computationally more efficient. An investigation of several simulation methods has been done in [234], with extensions of ISM, namely ISM with directional sources, and ISM with a diffuse field due to scattering. The authors of [234] compared the simulation algorithms via the training of an MLP (in both regression and classification modes) and showed that ISM with scattering effect and directional sources leads to the best SSL performance.…”
Section: A Synthetic Datamentioning
confidence: 99%