Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1111
|View full text |Cite
|
Sign up to set email alerts
|

Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks

Abstract: We present a novel learning-based approach to estimate the direction-of-arrival (DOA) of a sound source using a convolutional recurrent neural network (CRNN) trained via regression on synthetic data and Cartesian labels. We also describe an improved method to generate synthetic data to train the neural network using state-of-the-art sound propagation algorithms that model specular as well as diffuse reflections of sound. We compare our model against three other CRNNs trained using different formulations of the… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
23
1

Year Published

2020
2020
2025
2025

Publication Types

Select...
6
3

Relationship

3
6

Authors

Journals

citations
Cited by 39 publications
(25 citation statements)
references
References 21 publications
1
23
1
Order By: Relevance
“…Tang et al found significant performance increases on an automatic speech recognition and keyword spotting task in [15] by using an acoustic simulation method that includes diffuse reflections. Using the same method, Tang et al also observed improved performance at a DOA estimation task [14].…”
Section: Introductionmentioning
confidence: 94%
See 1 more Smart Citation
“…Tang et al found significant performance increases on an automatic speech recognition and keyword spotting task in [15] by using an acoustic simulation method that includes diffuse reflections. Using the same method, Tang et al also observed improved performance at a DOA estimation task [14].…”
Section: Introductionmentioning
confidence: 94%
“…Inspired by the success of DNNs in many fields, several such approaches have been proposed for sound/speech source localisation (SSL) [7,8,9,10,11,12,13,14].…”
Section: Introductionmentioning
confidence: 99%
“…This allows us to compute both early reflections and late reverberation efficiently. One speech-related problem that has benefited from more accurate simulations is the direction-of-arrival estimation task [37]. We argue that using a more accurate geometric acoustic simulation that faithfully models the late reverberation for general speech-related training will lead to better performance in learning-based models.…”
Section: Diffuse Acoustic Simulationmentioning
confidence: 99%
“…Similarly, Bryan estimates the T 60 and the direct-toreverberant ratio (DRR) from a single speech recording via augmented datasets [5]. Tang et al trained CRNN models purely based on synthetic spatial IRs that generalize to real-world recordings [60]. We strategically design an augmentation scheme to address the challenge of equalization's dependence on both IRs and speaker voice profiles, which is fully complimentary to all prior data-driven methods.…”
Section: Related Workmentioning
confidence: 99%