Maps are a key component in image-based camera localization and visual SLAM systems: they are used to establish geometric constraints between images, correct drift in relative pose estimation, and relocalize cameras after lost tracking. The exact definitions of maps, however, are often application-specific and hand-crafted for different scenarios (e.g. 3D landmarks, lines, planes, bags of visual words). We propose to represent maps as a deep neural net called MapNet, which enables learning a data-driven map representation. Unlike prior work on learning maps, Map-Net exploits cheap and ubiquitous sensory inputs like visual odometry and GPS in addition to images and fuses them together for camera localization. Geometric constraints expressed by these inputs, which have traditionally been used in bundle adjustment or pose-graph optimization, are formulated as loss terms in MapNet training and also used during inference. In addition to directly improving localization accuracy, this allows us to update the MapNet (i.e., maps) in a self-supervised manner using additional unlabeled video sequences from the scene. We also propose a novel parameterization for camera rotation which is better suited for deep-learning based camera pose regression. Experimental results on both the indoor 7-Scenes dataset and the outdoor Oxford RobotCar dataset show significant performance improvement over prior work. The MapNet project webpage is https://goo.gl/mRB3Au.
Camera spectral sensitivity functions relate scene radiance with captured RGB triplets. They are important for many computer vision tasks that use color information, such as multispectral imaging, color rendering, and color constancy. In this paper, we aim to explore the space of spectral sensitivity functions for digital color cameras. After collecting a database of 28 cameras covering a variety of types, we find this space convex and two-dimensional. Based on this statistical model, we propose two methods to recover camera spectral sensitivities using regular reflective color targets (e.g., color checker) from a single image with and without knowing the illumination. We show the proposed model is more accurate and robust for estimating camera spectral sensitivities than other basis functions. We also show two applications for the recovery of camera spectral sensitivities -simulation of color rendering for cameras and computational color constancy.
Cameras face a fundamental tradeoff between the spatial and temporal resolution -digital still cameras can capture images with high spatial resolution, but most high-speed video cameras suffer from low spatial resolution. It is hard to overcome this tradeoff without incurring a significant increase in hardware costs. In this paper, we propose techniques for sampling, representing and reconstructing the space-time volume in order to overcome this tradeoff. Our approach has two important distinctions compared to previous works: (1) we achieve sparse representation of videos by learning an over-complete dictionary on video patches, and (2) we adhere to practical constraints on sampling scheme which is imposed by architectures of present image sensor devices. Consequently, our sampling scheme can be implemented on image sensors by making a straightforward modification to the control unit. To demonstrate the power of our approach, we have implemented a prototype imaging system with per-pixel coded exposure control using a liquid crystal on silicon (LCoS) device. Using both simulations and experiments on a wide range of scenes, we show that our method can effectively reconstruct a video from a single image maintaining high spatial resolution.
Figure 1. This paper proposes a deep neural architecture, PlaneRCNN, that detects planar regions and reconstructs a piecewise planar depthmap from a single RGB image. From left to right, an input image, segmented planar regions, estimated depthmap, and reconstructed planar surfaces. AbstractThis paper proposes a deep neural architecture, PlaneR-CNN, that detects and reconstructs piecewise planar surfaces from a single RGB image. PlaneRCNN employs a variant of Mask R-CNN to detect planes with their plane parameters and segmentation masks. PlaneRCNN then jointly refines all the segmentation masks with a novel loss enforcing the consistency with a nearby view during training. The paper also presents a new benchmark with more finegrained plane segmentations in the ground-truth, in which, PlaneRCNN outperforms existing state-of-the-art methods with significant margins in the plane detection, segmentation, and reconstruction metrics. PlaneRCNN makes an important step towards robust plane extraction, which would have an immediate impact on a wide range of applications including Robotics, Augmented Reality, and Virtual Reality.Mask R-CNN was originally designed for semantic segmentation, where images contain instances of varying categories (e.g., person, car, train, bicycle and more). Our problem has only two categories "planar" or "non-planar",
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.