2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7952686
|View full text |Cite
|
Sign up to set email alerts
|

3D audio-visual speaker tracking with an adaptive particle filter

Abstract: We propose an audiovisual fusion algorithm for 3D speaker tracking from a localised multi-modal sensor platform composed of a camera and a small microphone array. After extracting audiovisual cues from individual modalities we fuse them adaptively using their reliability in a particle filter framework. The reliability of the audio signal is measured based on the maximum Global Coherence Field (GCF) peak value at each frame. The visual reliability is based on colour-histogram matching with detection results com… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
31
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(31 citation statements)
references
References 21 publications
0
31
0
Order By: Relevance
“…This corresponds to a small process noise, which cannot be handled efficiently using PFs. However, the PF baseline method from [32] shows a performance comparable to the EKF-based methods, which indicates that the MDF approach used in this algorithm is efficient on this dataset.…”
Section: B Audiovisual Tracking Performance Analysismentioning
confidence: 91%
See 2 more Smart Citations
“…This corresponds to a small process noise, which cannot be handled efficiently using PFs. However, the PF baseline method from [32] shows a performance comparable to the EKF-based methods, which indicates that the MDF approach used in this algorithm is efficient on this dataset.…”
Section: B Audiovisual Tracking Performance Analysismentioning
confidence: 91%
“…A comparison of the Bayesian filtering framework proposed in this study with state-of-the-art audiovisual speaker tracking methods is the primary focus of the second evaluation scenario. Four different frameworks were selected as baseline methods: the standard EKF with audiovisual observations, the audiovisual fusion technique based on an iterated EKF as proposed by Gehring et al [30], the PF-based approach with adaptive particle weighting introduced by Gerlach et al [31] and the recently proposed framework by Qian et al [32], which explicitly incorporates sensor reliability measures into the weighting stage of the PF. These methods are compared with the ODSW-EKF with Dirichlet prior and a DSW-EKF with corresponding prediction model based on the logistic function, as introduced in Sec.…”
Section: B Audiovisual Tracking Performance Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…More recently, audio-visual trackers based on particle filtering (PF) and probability hypothesis density (PHD) filters were proposed, e.g. [4]- [7], [20]- [22]. In [6] DOAs of audio sources to guide the propagation of particles and combined the filter with a mean-shift algorithm to reduce the computational complexity.…”
Section: Related Workmentioning
confidence: 99%
“…Alternatively, [7] used a Markov chain Monte Carlo particle filter (MCMC-PF) to increase sampling efficiency. Still in a particle filter tracking framework, [8] proposed to use the maximum global coherence field of the audio signal and image colorhistogram matching to adapt the reliability of audio and visual information. Finally, along a different line, [9] used visual tracking information to assist source separation and beamforming.…”
Section: Introductionmentioning
confidence: 99%