2012
DOI: 10.1049/iet-spr.2011.0124
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal (audio–visual) source separation exploiting multi-speaker tracking, robust beamforming and time–frequency masking

Abstract: A novel multimodal source separation approach is proposed for physically moving and stationary sources which exploits a circular microphone array, multiple video cameras, robust spatial beamforming and time-frequency masking. The challenge of separating moving sources, including higher reverberation time (RT) even for physically stationary sources, is that the mixing filters are time varying; as such the unmixing filters should also be time varying but these are difficult to determine from only audio measureme… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(3 citation statements)
references
References 51 publications
(65 reference statements)
0
3
0
Order By: Relevance
“…In general, ABS-IPC systems can enhance the performance by using larger false alarm rates. For example, in test EPas2.4, when the false alarm rate is increased to 0.1, the output SINRs of the ABS-IPC system increase to (12,15) dB (while the output SINRs of the ABS system drop to (5, 13) dB).…”
Section: Assessment Methodsmentioning
confidence: 96%
See 1 more Smart Citation
“…In general, ABS-IPC systems can enhance the performance by using larger false alarm rates. For example, in test EPas2.4, when the false alarm rate is increased to 0.1, the output SINRs of the ABS-IPC system increase to (12,15) dB (while the output SINRs of the ABS system drop to (5, 13) dB).…”
Section: Assessment Methodsmentioning
confidence: 96%
“…Other existing approaches for active speaker identification in multi-speaker environments include video signal processing, speaker recognition, and multi-source localisation techniques aided by clustering using Gaussian mixture models and expectation maximisation [15][16][17][18]. Another approach that is related to the VAC problem is to detect the number of sources.…”
Section: Introductionmentioning
confidence: 99%
“…Still in a particle filter tracking framework, [8] proposed to use the maximum global coherence field of the audio signal and image colorhistogram matching to adapt the reliability of audio and visual information. Finally, along a different line, [9] used visual tracking information to assist source separation and beamforming.…”
Section: Introductionmentioning
confidence: 99%