We present the design and implementation of an activity recognition system in wide area aerial video surveillance using Entity Relationship Models (ERM). In this approach, finding an activity is equivalent to sending a query to a Relational DataBase Management System (RDBMS). By incorporating reference imagery and Geographic Information System (GIS) data, tracked objects can be associated with physical meanings, and several high levels of reasoning, such as traffic patterns or abnormal activity detection, can be performed. We demonstrate that different types of activities, with hierarchical structure, multiple actors, and context information, are effectively and efficiently defined and inferred using the ERM framework. We also show how visual tracks can be better interpreted as activities by using geo information. Experimental results on both real visual tracks and GPS traces validate our approach.
Creating an obstacle detection system is an important challenge to improve safety for road vehicles. A way to meet the industrial cost requirements is to gather a monocular vision sensor. This paper tackles this problem and defines an highly parallelisable image motion segmentation method for taking into account the current evolution of multi processor computer technology. A complete and modular solution is proposed, based on the Tensor Voting framework extended to the 4D space (x, y, dx, dy), where surfaces describe homogeneous moving areas in the image plan. Watershed segmentation is applied on the result to obtain closed boundaries. Cells are then clustered and labeled with respect to planar parallax rigidity constraints. A visual odometry method, based on texture learning and tracking, is used to estimate residual parallax displacement.
We present a robust, real-time 3-D face tracking and modeling system providing accurate 6 degree-of-freedom head pose in the presence of large out-of-plane motion, strong expression changes, and partial occlusions. In this paper, we have extended the previous 3-D face tracking and modeling framework [10] with automatic initialization, reacquisition, and automatic pose correction. Our system first generates a 3-D face model from a single frontal image. We then extract uniformly distributed random points and track them in 2-D. Given these correspondences, the 3-D head pose is robustly estimated using a RANSAC-PnP process. As the head moves, we dynamically add new feature points to handle a large range of poses. A measure of the accumulated error over time allows an auto-correction mechanism to recover from drift when necessary. If the tracker gets lost, due to motion blur or strong occlusions, the system re-initializes. We present live demo results, which shows excellent tracking under large motion (roll: 360 • , yaw: ±90 • , pitch: -60 • to +90 • ), fast movement, occlusion and facial expression variations. The system runs at 14 fps on a laptop CPU. By experiments on different datasets, our method shows state of the art results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.