ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413399
|View full text |Cite
|
Sign up to set email alerts
|

Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain

Abstract: Estimating the positions of multiple speakers can be helpful for tasks like automatic speech recognition or speaker diarization. Both applications benefit from a known speaker position when, for instance, applying beamforming or assigning unique speaker identities. Recently, several approaches utilizing acoustic signals augmented with visual data have been proposed for this task. However, both the acoustic and the visual modality may be corrupted in specific spatial regions, for instance due to poor lighting c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 25 publications
0
1
0
Order By: Relevance
“…Other methods approach the task of ASL, which seeks to localize speakers spatially within the scene rather than classifying bounding box tracks [7,16,24,37,38,87,104,106]. Several use multichannel audio to incorporate directional audio information [7,16,24,37,38,104,106]. Recently,…”
Section: Related Workmentioning
confidence: 99%
“…Other methods approach the task of ASL, which seeks to localize speakers spatially within the scene rather than classifying bounding box tracks [7,16,24,37,38,87,104,106]. Several use multichannel audio to incorporate directional audio information [7,16,24,37,38,104,106]. Recently,…”
Section: Related Workmentioning
confidence: 99%