2018
DOI: 10.1016/j.csl.2017.12.002
|View full text |Cite
|
Sign up to set email alerts
|

Localizing speakers in multiple rooms by using Deep Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 29 publications
(21 citation statements)
references
References 26 publications
0
21
0
Order By: Relevance
“…The baseline system is a state-of-the-art DNN-based localisation system using GCC-PHAT features as inputs [6,22]. GCC-PHAT features are computed as the inverse transform of the frequency domain cross-correlation of two audio signals captured by a microphone pair.…”
Section: Baseline Systemmentioning
confidence: 99%
See 1 more Smart Citation
“…The baseline system is a state-of-the-art DNN-based localisation system using GCC-PHAT features as inputs [6,22]. GCC-PHAT features are computed as the inverse transform of the frequency domain cross-correlation of two audio signals captured by a microphone pair.…”
Section: Baseline Systemmentioning
confidence: 99%
“…In [5], probabilistic neural networks were used to estimate the direction of arrival (DOA) in an indoor environment using GCCbased features. A similar scenario was studied in [6] which used a convolutional neural network (CNN) to predict speaker coordinates. Binaural cues are employed in [7], where the cross-correlation function (CCF) was used as features in a DNN to estimate the azimuth of a sound source with simulated head movement.…”
Section: Introductionmentioning
confidence: 99%
“…The latter is extended to a multi-channel 3D-CNN system in [31], where log-Mel filterbank energies (40-dimensional) are employed as features, temporal context is exploited by concatenating adjacent time frames, and the resulting 2D single-microphone feature matrices are stacked across channels. Finally, in [32], the aforementioned 3D-CNN is combined with the GCC-PHAT [70] based CNN of [71] to yield a joint SAD and speaker localization network.…”
Section: Related Workmentioning
confidence: 99%
“…With the advent and huge increase of applications of deep neural networks in all areas of machine learning, promising works have also been proposed for ASL [ 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 57 , 58 , 59 , 60 , 61 ]. This is mainly due to the sophisticated capabilities and more careful implementation details of network architectures and the availability of advanced hardware architectures with increased computational capacity.…”
Section: State Of the Artmentioning
confidence: 99%
“…The idea of using neural networks for sound processing is not new and has gained popularity in recent years (especially for speech recognition [ 4 ]). In the context of ASL, deep learning methods have been recently developed [ 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 ]. Most of these works focus on obtaining the Direction of Arrival (DOA) of the acoustic source.…”
Section: Introductionmentioning
confidence: 99%