Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.
DOI: 10.1109/icassp.2005.1415385
|View full text |Cite
|
Sign up to set email alerts
|

An Audio-Visual Database For Evaluating Person Tracking Algorithms

Abstract: This paper presents an audio-visual database that can be used as a reference database for testing and evaluation of video, audio or joint audio-visual person tracking algorithms, as well as speaker localization methods. Additional possible uses include the testing of face detection and pose estimation algorithms. A number of different scenes are included in the database, ranging from simple to complex scenes that can challenge existing algorithms. They include different subjects, with appearances that can caus… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0
3

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 14 publications
(12 citation statements)
references
References 9 publications
(9 reference statements)
0
9
0
3
Order By: Relevance
“…The ideal case involves direct comparison of the output of a tracking system against reference or ground truth data. Video sequences with ground truth were available for the purpose of this study [44]. The sequences were obtained in the Virtual Studio of the Technical University of Ilmenau in Germany, as part of the CARROUSO ("Creating, Assessing and Rendering in Real Time of High Quality AudioVisual Environments in MPEG-4 Context" [45]) European Research Project.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The ideal case involves direct comparison of the output of a tracking system against reference or ground truth data. Video sequences with ground truth were available for the purpose of this study [44]. The sequences were obtained in the Virtual Studio of the Technical University of Ilmenau in Germany, as part of the CARROUSO ("Creating, Assessing and Rendering in Real Time of High Quality AudioVisual Environments in MPEG-4 Context" [45]) European Research Project.…”
Section: Resultsmentioning
confidence: 99%
“…It should be noted, though, that the frame rate can substantially increase (12-15 frames/sec) at the expense of accuracy if the frames are sub-sampled prior to processing or certain internal parameters of the detection algorithms are relaxed. Test video sequences include the CARROUSO project video sequences [44]. In Figures 6, 7 and 8, the results of automatic face detection and 2-D tracking for three of the CARROUSO project video sequences are illustrated.…”
Section: Trackingmentioning
confidence: 99%
“…The proposed classes are an amalgam of low-level features, like histograms, FFTs etc, and high semantic entities, like scene theme, person recognition, emotions, sounds qualification, etc. Nowadays researchers are very interested in extracting high semantic features [9], [10], [11]. Having this in mind, we believe that the proposed template can give the genre of a new digital video processing approach.…”
Section: Mpeg-7 Video Descriptorsmentioning
confidence: 99%
“…These data are made available as part of an audiovisual database [18]. This database also includes reference data of the speaker positions measured using infrared sensors.…”
Section: Resultsmentioning
confidence: 99%
“…1 (a) and (b), we see that (18) and (19) can in fact directly be considered as the generalization of the AED condition (2) to multiple sources. Using a proper coefficient initialization, (18) is the corresponding equation to estimate the TDOA of source 1, while (19) gives the TDOA of source 2. Moreover, since the coefficient initialization, described in Sect.…”
Section: Tdoa Estimation and Relation To Simo-based Approachmentioning
confidence: 88%