2021
DOI: 10.1016/j.patcog.2021.107906
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal fusion for indoor sound source localization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 43 publications
0
5
0
Order By: Relevance
“…To train this model, 360°images with multi-channel audio signals were fed into visual and auditory DNNs. Chen et al [23] replaced the DNN with SVM and trained the models using 1-channel audio from a single microphone to simultaneously estimate the DoA and the distance between the sound source and the microphone. Regrettably, although the above systems can accurately locate multiple sound sources at the same time, they are unable to identify the sources and focus on a specific target, which is actually a highly desired capability of robot audition in cocktail party environments.…”
Section: Related Work 21 Audio-visual Sound Source Localization (Av-ssl)mentioning
confidence: 99%
“…To train this model, 360°images with multi-channel audio signals were fed into visual and auditory DNNs. Chen et al [23] replaced the DNN with SVM and trained the models using 1-channel audio from a single microphone to simultaneously estimate the DoA and the distance between the sound source and the microphone. Regrettably, although the above systems can accurately locate multiple sound sources at the same time, they are unable to identify the sources and focus on a specific target, which is actually a highly desired capability of robot audition in cocktail party environments.…”
Section: Related Work 21 Audio-visual Sound Source Localization (Av-ssl)mentioning
confidence: 99%
“…4) Deep Learning-Based SOL Approaches: Recently, SOL-related works are all based on deep learning [113], [114], [115], [116], whose key idea is to perform audio and visual feature embedding. And most of them can be roughly divided into two groups: 1) the class activation mapping-(CAM-) based ones and 2) the feature similarity-based ones.…”
Section: Sounding Object Localization (Sol)mentioning
confidence: 99%
“…However, when encountering a strong reflection condition, noise source localization will encounter serious problems [9] . At present, there are many dereverberation processing methods for speech signal [10][11] . However, there are few methods for steady noise source localization in a strong reflection condition, which still needs further research.…”
Section: Introductionmentioning
confidence: 99%