Abstract. Geographical Information System (GIS) databases contain information about many objects, such as traffic signals, road signs, fire hydrants, etc. in urban areas. This wealth of information can be utilized for assisting various computer vision tasks. In this paper, we propose a method for improving object detection using a set of priors acquired from GIS databases. Given a database of object locations from GIS and a query image with metadata, we compute the expected spatial location of the visible objects in the image. We also perform object detection in the query image (e.g., using DPM) and obtain a set of candidate bounding boxes for the objects. Then, we fuse the GIS priors with the potential detections to find the final object bounding boxes. To cope with various inaccuracies and practical complications, such as noisy metadata, occlusion, inaccuracies in GIS, and poor candidate detections, we formulate our fusion as a higher-order graph matching problem which we robustly solve using RANSAC. We demonstrate that this approach outperforms well established object detectors, such as DPM, with a large margin. Furthermore, we propose that the GIS objects can be used as cues for discovering the location where an image was taken. Our hypothesis is based on the idea that the objects visible in one image, along with their relative spatial location, provide distinctive cues for the geo-location. In order to estimate the geo-location based on the generic objects, we perform a search on a dense grid of locations over the covered area. We assign a score to each location based on the similarity of its GIS objects and the imperfect object detections in the image. We demonstrate that over a broad urban area of >10 square kilometers, this semantic approach can significantly narrow down the localization search space, and occasionally, even find the correct location.
Abstract. Egocentric cameras are becoming increasingly popular and provide us with large amounts of videos, captured from the first person perspective. At the same time, surveillance cameras and drones offer an abundance of visual information, often captured from top-view. Although these two sources of information have been separately studied in the past, they have not been collectively studied and related. Having a set of egocentric cameras and a top-view camera capturing the same area, we propose a framework to identify the egocentric viewers in the top-view video. We utilize two types of features for our assignment procedure. Unary features encode what a viewer (seen from top-view or recording an egocentric video) visually experiences over time. Pairwise features encode the relationship between the visual content of a pair of viewers. Modeling each view (egocentric or top) by a graph, the assignment process is formulated as spectral graph matching. Evaluating our method over a dataset of 50 top-view and 188 egocentric videos taken in different scenarios demonstrates the efficiency of the proposed approach in assigning egocentric viewers to identities present in top-view camera. We also study the effect of different parameters such as the number of egocentric viewers and visual features. An extended abstract of this effort has been reported in [1].
The availability of GIS (Geographical Information System) databases for many urban areas, provides a valuable source of information for improving the performance of many computer vision tasks. In this paper, we propose a method which leverages information acquired from GIS databases to perform semantic segmentation of the image alongside with geo-referencing each semantic segment with its address and geo-location. First, the image is segmented into a set of initial super-pixels. Then, by projecting the information from GIS databases, a set of priors are obtained about the approximate location of the semantic entities such as buildings and streets in the image plane. However, there are significant inaccuracies (misalignments) in the projections, mainly due to inaccurate GPS-tags and camera parameters. In order to address this misalignment issue, we perform data fusion such that it improves the segmentation and GIS projections accuracy simultaneously with an iterative approach. At each iteration, the projections are evaluated and weighted in terms of reliability, and then fused with the super-pixel segmentations. First segmentation is performed using random walks, based on the GIS projections. Then the global transformation which best aligns the projections to their corresponding semantic entities is computed and applied to the projections to further align them to the content of the image. The iterative approach continues until the projections and segments are well aligned.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.