2017 IEEE International Conference on Computer Vision Workshops (ICCVW) 2017
DOI: 10.1109/iccvw.2017.60
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting the Complementarity of Audio and Visual Data in Multi-speaker Tracking

Abstract: Multi-speaker tracking is a central problem in humanrobot interaction. In this context, exploiting auditory and visual information is gratifying and challenging at the same time. Gratifying because the complementary nature of auditory and visual information allows us to be more robust against noise and outliers than unimodal approaches. Challenging because how to properly fuse auditory and visual information for multi-speaker tracking is far from being a solved problem. In this paper we propose a probabilistic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
26
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
2

Relationship

4
5

Authors

Journals

citations
Cited by 23 publications
(26 citation statements)
references
References 19 publications
0
26
0
Order By: Relevance
“…Similar to [7], [16] uses one CGMM for each predefined speaker; the model is plugged into a recursive EM (REM) algorithm in order to update the Multiple Speaker Tracking (Section IV) Speaker tracking methods are generally based on Bayesian inference which combines localization with dynamic models in order to estimate the posterior probability distribution of audio-source directions, e.g. [17]- [19]. Kalman filtering and particle filtering were used in [20] and in [21], respectively, for tracking a single audio source.…”
Section: Introductionmentioning
confidence: 99%
“…Similar to [7], [16] uses one CGMM for each predefined speaker; the model is plugged into a recursive EM (REM) algorithm in order to update the Multiple Speaker Tracking (Section IV) Speaker tracking methods are generally based on Bayesian inference which combines localization with dynamic models in order to estimate the posterior probability distribution of audio-source directions, e.g. [17]- [19]. Kalman filtering and particle filtering were used in [20] and in [21], respectively, for tracking a single audio source.…”
Section: Introductionmentioning
confidence: 99%
“…This is probably one of the most prominent features of the method, in contrast with most existing audio-visual tracking methods which require continuous and simultaneous flows of visual and audio data. This paper is an extended version of [25] and of [26]. The probabilistic model and its variational approximation were briefly presented in [25] together with preliminary results obtained with three AVDIAR sequences.…”
Section: Related Workmentioning
confidence: 99%
“…This paper is an extended version of [25] and of [26]. The probabilistic model and its variational approximation were briefly presented in [25] together with preliminary results obtained with three AVDIAR sequences. Reverberation-free audio features were used in [26] where it was shown that good performance could be obtained with these features when the audio mapping was trained in one room and tested in another room.…”
Section: Related Workmentioning
confidence: 99%
“…To localize moving speakers, a tracking scheme based on Bayesian techniques estimates the posterior distribution of source locations given a sequence of instantaneous estimates of localization features (or of speaker locations) and a dynamic model of source movement, e.g. [12]- [14]. To tackle speech turns, speaker birth and death processes [15] and/or a model of speech activity [16] can be included.…”
Section: Introductionmentioning
confidence: 99%