A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects, is proposed. The feature histogram-based target representations are regularized by spatial masking with an isotropic kernel. The masking induces spatially-smooth similarity functions suitable for gradient-based optimization, hence, the target localization problem can be formulated using the basin of attraction of the local maxima. We employ a metric derived from the Bhattacharyya coefficient as similarity measure, and use the mean shift procedure to perform the optimization. In the presented tracking examples, the new method successfully coped with camera motion, partial occlusions, clutter, and target scale variations. Integration with motion filters and data association techniques is also discussed. We describe only a few of the potential applications: exploitation of background information, Kalman tracking using motion models, and face tracking.
A new method for real-time tracking of non-rigid objects seen from a moving camera i s p r oposed. The central computational module is based on the mean shift iterations and nds the most probable target position in the current frame. The dissimilarity between the target model (its color distribution) and the target candidates is expressed by a metric derived f r om the Bhattacharyya coe cient. The theoretical analysis of the approach shows that it relates to the Bayesian framework while providing a practical, fast and e cient solution. The capability of the tracker to handle in real-time partial occlusions, signi cant clutter, and target scale variations, is demonstrated for several image sequences.
We present two solutions for the scale selection problem in computer vision. The first one is completely nonparametric and is based on the the adaptive estimation of the normalized density gradient. Employing the sample point estimator, we define the Variable Bandwidth Mean Shift, prove its convergence, and show its superiority over the fixed bandwidth procedure. The second technique has a semiparametric nature and imposes a local structure on the data to extract reliable scale information. The local scale of the underlying density is taken as the bandwidth which maximizes the magnitude of the normalized mean shift vector. Both estimators provide practical tools for autonomous image and quasi real-time video analysis and several examples are shown to illustrate their effectiveness.[15], which was proven to be superior to least squares cross validation and biased cross-validation [ll], [lG, ,
This paper describes a computer vision based system for real-time robust traffic sign detection, tracking, and recognition. Such a framework is of major interest for driver assistance in an intelligent automotive cockpit environment. The proposed approach consists of two components. First, signs are detected using a set of Haar wavelet features obtained from Ada-Boost training. Compared to previously published approaches, our solution offers a generic, joint modeling of color and shape information without the need of tuning free parameters. Once detected, objects are efficiently tracked within a temporal information propagation framework. Second, classification is performed using Bayesian generative modeling. Making use of the tracking information, hypotheses are fused over multiple frames. Experiments show high detection and recognition accuracy and a frame rate of approximately 10 frames per second on a standard PC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.