2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
DOI: 10.1109/icassp.2004.1327252
|View full text |Cite
|
Sign up to set email alerts
|

Multiple person and speaker activity tracking with a particle filter

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
64
0

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 61 publications
(64 citation statements)
references
References 7 publications
0
64
0
Order By: Relevance
“…In comparison to the KF and EKF approaches, the PF approach is more robust for nonlinear models as it can approach the Bayesian optimal estimate with a sufficiently large number of particles [15]. It has been widely employed for speaker tracking problems [16], [17], [18]. For example, in [16] and [17], PF is used to fuse object shapes and audio information.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In comparison to the KF and EKF approaches, the PF approach is more robust for nonlinear models as it can approach the Bayesian optimal estimate with a sufficiently large number of particles [15]. It has been widely employed for speaker tracking problems [16], [17], [18]. For example, in [16] and [17], PF is used to fuse object shapes and audio information.…”
Section: Introductionmentioning
confidence: 99%
“…For example, in [16] and [17], PF is used to fuse object shapes and audio information. In [18], independent audio and video observation models are fused for simultaneous tracking and detection of multiple speakers. One challenge in using PF, however, is to choose an appropriate number of particles.…”
Section: Introductionmentioning
confidence: 99%
“…Audio modality also suffers from environmental factors such as background noise, reverberation and reflections. It has been shown that information from heterogeneous sensors such as cameras and microphones can be fused in a unified manner both at sensor level [5] or at feature level [6]. This fusion of modalities can compensate for the failure of each other and is used in target localization and tracking in indoor and outdoor scenarios using a network of audiovisual sensors.…”
Section: Related Workmentioning
confidence: 99%
“…For example, featurelevel (early) fusion of video and audio has been proposed for the problems speech processing [Hershey et al 2004] and recognition [Nefian et al 2002], tracking [Checka et al 2004], and monologue detection [Nock et al 2002] by using the mutual information among the video and audio features under the assumption that audio and video signals are individually and jointly Gaussian random variables. On the other hand, late fusion strategies have also been used in sensor fusion applications [Rao and Whyte 1993], [Chair and Varshney 1986], [Kam et al 1992].…”
Section: Related Workmentioning
confidence: 99%