Abstract. In this paper, we propose to track multiple previously unseen objects in unconstrained scenes. Instead of considering objects individually, we model objects in mutual context with each other to benefit robust and accurate tracking. We introduce a unified framework to combine both Individual Object Models (IOMs) and Mutual Relation Models (MRMs). The MRMs consist of three components, the relational graph to indicate related objects, the mutual relation vectors calculated within related objects to show the interactions, and the relational weights to balance all interactions and IOMs. As MRMs are varying along temporal sequences, we propose online algorithms to make MRMs adapt to current situations. We update relational graphs through analyzing object trajectories and cast the relational weight learning task as an online latent SVM problem. Extensive experiments on challenging real world video sequences demonstrate the efficiency and effectiveness of our framework.
Text detection in natural scenes is fundamental for text image analysis. In this paper, we propose a context-based approach for robust and fast text detection. Our main contribution is that we introduce a new concept of key region, which is described with context according to stroke properties, appearance consistency and specific spatial distribution of text line. With such context descriptors, we adopt SVM to learn a context-based classifier to find key regions in candidate regions. Therein, candidate regions are connected components generated by local binarization algorithm in the areas, which are detected by an offline learned text patch detector. Experimental results on two benchmark datasets demonstrate that our approach has achieved competitive performances compared with the state-of-the-art algorithms including the stroke width transform (SWT) [1] and the hybrid approach based on CRFs [2] with speedup rates of about 1.7x∼4.4x.
Abstract. In this paper, we aim to detect human in video over large viewpoint changes which is very challenging due to the diversity of human appearance and motion from a wide spread of viewpoint domain compared with a common frontal viewpoint. We propose 1) a new feature called Intra-frame and Inter-frame Comparison Feature to combine both appearance and motion information, 2) an Enhanced Multiple Clusters Boost algorithm to co-cluster the samples of various viewpoints and discriminative features automatically and 3) a Multiple Video Sampling strategy to make the approach robust to human motion and frame rate changes. Due to the large amount of samples and features, we propose a two-stage tree structure detector, using only appearance in the 1 st stage and both appearance and motion in the 2 nd stage. Our approach is evaluated on some challenging Real-world scenes, PETS2007 dataset, ETHZ dataset and our own collected videos, which demonstrate the effectiveness and efficiency of our approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.