An Audio-Visual Database For Evaluating Person Tracking Algorithms

Krinidis, Michail; Stamou, George P.; Teutsch, H.; Spors, Sascha; Nikolaidis, Nikos; Rabenstein, Rudolf; Pitas, L.

doi:10.1109/icassp.2005.1415385

Cited by 14 publications

(12 citation statements)

References 9 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The ideal case involves direct comparison of the output of a tracking system against reference or ground truth data. Video sequences with ground truth were available for the purpose of this study [44]. The sequences were obtained in the Virtual Studio of the Technical University of Ilmenau in Germany, as part of the CARROUSO ("Creating, Assessing and Rendering in Real Time of High Quality AudioVisual Environments in MPEG-4 Context" [45]) European Research Project.…”

Section: Resultsmentioning

confidence: 99%

“…It should be noted, though, that the frame rate can substantially increase (12-15 frames/sec) at the expense of accuracy if the frames are sub-sampled prior to processing or certain internal parameters of the detection algorithms are relaxed. Test video sequences include the CARROUSO project video sequences [44]. In Figures 6, 7 and 8, the results of automatic face detection and 2-D tracking for three of the CARROUSO project video sequences are illustrated.…”

Section: Trackingmentioning

confidence: 99%

See 1 more Smart Citation

A monocular system for person tracking: Implementation and testing

Stamou

Krinidis

Nikolaidis

et al. 2007

J Multimodal User Interfaces

View full text Add to dashboard Cite

This paper presents a complete functional system capable of detecting people and tracking their motion in either live camera feed or pre-recorded video sequences. The system consists of two main modules, namely the detection and tracking modules. Automatic detection aims at locating human faces and is based on fusion of color and feature-based information. Thus, it is capable of handling faces in different orientations and poses (frontal, profile, intermediate). To avoid false detections, a number of decision criteria are employed. Tracking is performed using a variant of the well-known Kanade-Lucas-Tomasi tracker, while occlusion is handled through a re-detection stage. Manual intervention is allowed to assist both modules if required. In manual mode, the system can track any object of interest, so long as there are enough features to track. The system caters for calibrated cameras and can provide 3-D coordinates of any tracked object(s) of interest. It has been tested with very good results on a variety of video sequences, including a database of studio video sequences, for which 3-D ground truth data, originating from a 4-camera infrared tracking system, exist.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Trackingmentioning

confidence: 99%

A monocular system for person tracking: Implementation and testing

Stamou

Krinidis

Nikolaidis

et al. 2007

J Multimodal User Interfaces

View full text Add to dashboard Cite

show abstract

“…The proposed classes are an amalgam of low-level features, like histograms, FFTs etc, and high semantic entities, like scene theme, person recognition, emotions, sounds qualification, etc. Nowadays researchers are very interested in extracting high semantic features [9], [10], [11]. Having this in mind, we believe that the proposed template can give the genre of a new digital video processing approach.…”

Section: Mpeg-7 Video Descriptorsmentioning

confidence: 99%

An MPEG-7 Based Description Scheme for Video Analysis Using Anthropocentric Video Content Descriptors

Vretos

Solachidis

Pitas

2005

Advances in Informatics

View full text Add to dashboard Cite

Abstract. MPEG-7 has emerged as the standard for multimedia data content description. As it is in its early age, it tries to evolve towards a direction in which semantic content description can be implemented. In this paper we provide a number of classes to extend the MPEG-7 standard so that it can handle the video media data, in a more uniform and anthropocentric way. Many descriptors (Ds) and description schemes (DSs) already provided by the MPEG-7 standard can help to implement semantics of a media. However, by grouping together several MPEG-7 classes and adding new Ds, better results in the video production and video analysis tasks can be produced. Several classes are proposed in this context and we show that the corresponding scheme produce a new profile which is more flexible in all types of applications as they are described in [1].

show abstract

“…These data are made available as part of an audiovisual database [18]. This database also includes reference data of the speaker positions measured using infrared sensors.…”

Section: Resultsmentioning

confidence: 99%

“…1 (a) and (b), we see that (18) and (19) can in fact directly be considered as the generalization of the AED condition (2) to multiple sources. Using a proper coefficient initialization, (18) is the corresponding equation to estimate the TDOA of source 1, while (19) gives the TDOA of source 2. Moreover, since the coefficient initialization, described in Sect.…”

Section: Tdoa Estimation and Relation To Simo-based Approachmentioning

confidence: 88%

Simultaneous Localization of Multiple Sound Sources using Blind Adaptive MIMO Filtering

Buchner

Aichner

Stenglein

et al.

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.

Self Cite

View full text Add to dashboard Cite

Blind adaptive filtering for time delay of arrival (TDOA) estimation is a very powerful method for acoustic source localization in reverberant environments with broadband signals like speech. Based on a recently presented generic framework for blind signal processing for convolutive mixtures, called TRINICON, we present a TDOA estimation method for simultaneous multidimensional localization of multiple sources. Moreover, an interesting link to the known single-input multiple-output (SIMO)-based adaptive eigenvalue decomposition (AED) method is shown. We evaluate the novel multiple-input multiple-output (MIMO)-based approach and compare it with the known SIMO-based method in a reverberant acoustic environment using reference data of the positions obtained from infrared sensors. The results show that the new approach is very robust against reverberation and background noise.

show abstract

An Audio-Visual Database For Evaluating Person Tracking Algorithms

Cited by 14 publications

References 9 publications

A monocular system for person tracking: Implementation and testing

A monocular system for person tracking: Implementation and testing

An MPEG-7 Based Description Scheme for Video Analysis Using Anthropocentric Video Content Descriptors

Simultaneous Localization of Multiple Sound Sources using Blind Adaptive MIMO Filtering

Contact Info

Product

Resources

About