ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413695
|View full text |Cite
|
Sign up to set email alerts
|

Data-Efficient Framework for Real-World Multiple Sound Source 2d Localization

Abstract: Deep neural networks have recently led to promising results for the task of multiple sound source localization. Yet, they require a lot of training data to cover a variety of acoustic conditions and microphone array layouts. One can leverage acoustic simulators to inexpensively generate labeled training data. However, models trained on synthetic data tend to perform poorly with real-world recordings due to the domain mismatch. Moreover, learning for different microphone array layouts makes the task more compli… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 26 publications
0
6
0
Order By: Relevance
“…It seems that we might be approaching the accuracy limit imposed by this dataset difference, so the accuracy with real recordings can not improve even when we improve the models. In recent years, several domain adaptation techniques have been proposed to improve the accuracy of models trained with simulated signals [48], [49], [50], [51] and it would be interesting to conduct further studies along these lines.…”
Section: Locata Datasetmentioning
confidence: 99%
“…It seems that we might be approaching the accuracy limit imposed by this dataset difference, so the accuracy with real recordings can not improve even when we improve the models. In recent years, several domain adaptation techniques have been proposed to improve the accuracy of models trained with simulated signals [48], [49], [50], [51] and it would be interesting to conduct further studies along these lines.…”
Section: Locata Datasetmentioning
confidence: 99%
“…The major drawback of the DNN-based approaches is the lack of generality. A deep model designed for and trained in a given configuration (for example a given microphone array geometry) will not provide satisfying localization results if the setup changes [24], [25], unless some relevant adaptation method can be used, which is still an open problem in deep learning in general. In this paper, we do not consider this aspect.…”
Section: B General Principle Of Dl-based Sslmentioning
confidence: 99%
“…They evaluated several types of outputs (binary, Gaussian-based, and binary followed by regression refinement) which showed promising results on the simulated and real data. They extended their work in [25] where they proposed to use adversarial training (see Section VIII) to improve the network performance on real data, as well as on microphone arrays unseen in the training set, in an unsupervised training scheme. To do that, they introduced a novel explicit transformation layer which helps the network to be invariant to the microphone array layout.…”
Section: G Encoder-decoder Neural Networkmentioning
confidence: 99%
See 2 more Smart Citations