2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016
DOI: 10.1109/icassp.2016.7471706
|View full text |Cite
|
Sign up to set email alerts
|

Sound source localization based on deep neural networks with directional activate function exploiting phase information

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
99
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 142 publications
(99 citation statements)
references
References 18 publications
0
99
0
Order By: Relevance
“…Additionally, for each architecture, we tune the model parameters such as the number of CNN, RNN, and FC layers (0 to 4) and nodes (in the set of [16,32,64,128,256,512]). The input sequence length is tuned in the set of [32,64,128,256,512], the DOA and SED branch output loss weights in the set of [1,5,50,500], the regularization (dropout in the set of [0, 0.1, 0.2, 0.3, 0.4, 0.5], L1 and L2 in the set of [0, 10 −1 ,10 −2 ,10 −3 ,10 −4 ,10 −5 ,10 −6 ,10 −7 ]) and the CNN max-pooling in the set of [2,4,6,8,16] for each layer. The best set of parameters are the ones which give the lowest SELD score on the three cross-validation splits of the dataset.…”
Section: Methodsmentioning
confidence: 99%
“…Additionally, for each architecture, we tune the model parameters such as the number of CNN, RNN, and FC layers (0 to 4) and nodes (in the set of [16,32,64,128,256,512]). The input sequence length is tuned in the set of [32,64,128,256,512], the DOA and SED branch output loss weights in the set of [1,5,50,500], the regularization (dropout in the set of [0, 0.1, 0.2, 0.3, 0.4, 0.5], L1 and L2 in the set of [0, 10 −1 ,10 −2 ,10 −3 ,10 −4 ,10 −5 ,10 −6 ,10 −7 ]) and the CNN max-pooling in the set of [2,4,6,8,16] for each layer. The best set of parameters are the ones which give the lowest SELD score on the three cross-validation splits of the dataset.…”
Section: Methodsmentioning
confidence: 99%
“…In [14], [16], GCC vectors, computed from the microphone signals, are provided as input to the learning framework. In [15], [17], similar to the computations involved in the MUSIC method for localization, the eigenvalue decomposition of the spatial correlation matrix is performed to get the eigenvectors corresponding to the noise subspace, and is provided as input to a neural network. In [13], a binaural setup is considered and binaural cues at different frequency sub-bands are computed and given as input.…”
Section: Introductionmentioning
confidence: 99%
“…It is also helpful to improve the performance of speech enhancement and separation systems if we could know the types of sounds [4,5]. Robotic systems can employ SED for navigation and natural interaction with surrounding acoustic environments [6,7]. Smart home devices can benefit from it for environmental sound understanding [8,9].…”
Section: Introductionmentioning
confidence: 99%