Audio Assisted Robust Visual Tracking With Adaptive Particle Filtering

Kılıç, Volkan; Barnard, Mark; Wang, Wenwu; Kittler, Josef

doi:10.1109/tmm.2014.2377515

Cited by 59 publications

(20 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the evolution of the time dependent speaker state, the constant velocity model is employed [36], [45] given as,…”

Section: Multi-speaker Tracking With the Phd Filtermentioning

confidence: 99%

“…The DOA data is introduced to the SMC-PHD filter based on [34] and [36] where the efficiency of the particles is improved under a particle filter framework by re-allocating all the particles around the DOA line which is drawn from the center of the microphone array to a point in the image frame estimated by the projection of DOA to 2D image plane. However, different from [34] and [36] in which the DOA is used in the same way for all the particles, here the contribution of the DOA information is varied depending on the type of the particles.…”

Section: Audio-visual Tracker With Smc-phd Filtermentioning

confidence: 99%

“…However, different from [34] and [36] in which the DOA is used in the same way for all the particles, here the contribution of the DOA information is varied depending on the type of the particles. Similar to [34] and [36], we also use the samspare-mean (SSM) method [48] for the DOA estimation which is further enhanced by a third-order Auto-Regressive (AR) model. We should note that there are other audio features and algorithms for extracting these features that could be used in our proposed system, however, exploring other audio detection methods is beyond the scope of this work.…”

Section: Audio-visual Tracker With Smc-phd Filtermentioning

confidence: 99%

“…More specifically, the propagation of the born particles is decided based on the DOA information and the particles are re-located around the line drawn upon the DOA. A similar approach has been used in [34], [35] and [36] under the PF framework for a fixed number of speakers. Here, the SMC-PHD filter is used, and to our knowledge, audio information has not been previously used with visual information in a SMC-PHD filter as we do here.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Mean-Shift and Sparse Sampling-Based SMC-PHD Filtering for Audio Informed Visual Speaker Tracking

Kılıç

Barnard

Wang

et al. 2016

IEEE Trans. Multimedia

Self Cite

View full text Add to dashboard Cite

Abstract-The probability hypothesis density (PHD) filter based on sequential Monte Carlo (SMC) approximation (also known as SMC-PHD filter) has proven to be a promising algorithm for multi-speaker tracking. However, it has a heavy computational cost as surviving, spawned and born particles need to be distributed in each frame to model the state of the speakers and to estimate jointly the variable number of speakers with their states. In particular, the computational cost is mostly caused by the born particles as they need to be propagated over the entire image in every frame to detect the new speaker presence in the view of the visual tracker. In this paper, we propose to use audio data to improve the visual SMC-PHD (V-SMC-PHD) filter by using the direction of arrival (DOA) angles of the audio sources to determine when to propagate the born particles and re-allocate the surviving and spawned particles. The tracking accuracy of the AV-SMC-PHD algorithm is further improved by using a modified mean-shift algorithm to search and climb density gradients iteratively to find the peak of the probability distribution, and the extra computational complexity introduced by mean-shift is controlled with a sparse sampling technique. These improved algorithms, named as AVMS-SMC-PHD and sparse-AVMS-SMC-PHD respectively, are compared systematically with AV-SMC-PHD and V-SMC-PHD based on the AV16.3, AMI and CLEAR datasets.

show abstract

“…For the evolution of the time dependent speaker state, the constant velocity model is employed [36], [45] given as,…”

Section: Multi-speaker Tracking With the Phd Filtermentioning

confidence: 99%

Section: Audio-visual Tracker With Smc-phd Filtermentioning

confidence: 99%

Section: Audio-visual Tracker With Smc-phd Filtermentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Mean-Shift and Sparse Sampling-Based SMC-PHD Filtering for Audio Informed Visual Speaker Tracking

Kılıç

Barnard

Wang

et al. 2016

IEEE Trans. Multimedia

Self Cite

View full text Add to dashboard Cite

show abstract

“…There is a consensus that different modalities are complementary to each other, which has motivated an increasing interest in cross-modal tracking in the last decade. Most of these works are done in the audio-visual domain [24,8,9]. Combination of other modalities has recently started to become more popular.…”

Section: Introductionmentioning

confidence: 99%

Person Tracking Using Audio and Depth Cues

Liu

Campos

Wang

et al. 2015

2015 IEEE International Conference on Computer Vision Workshop (ICCVW)

Self Cite

View full text Add to dashboard Cite

In this paper, a novel probabilistic Bayesian tracking scheme is proposed and applied to bimodal measurements consisting of tracking results from the depth sensor and audio recordings collected using binaural microphones. We use random finite sets to cope with varying number of tracking targets. A measurement-driven birth process is integrated to quickly localize any emerging person. A new bimodal fusion method that prioritizes the most confident modality is employed. The approach was tested on real room recordings and experimental results show that the proposed combination of audio and depth outperforms individual modalities, particularly when there are multiple people talking simultaneously and when occlusions are frequent.

show abstract