We introduce a novel strategy for learning to extract semantically meaningful features from aerial imagery. Instead of manually labeling the aerial imagery, we propose to predict (noisy) semantic features automatically extracted from co-located ground imagery. Our network architecture takes an aerial image as input, extracts features using a convolutional neural network, and then applies an adaptive transformation to map these features into the ground-level perspective. We use an end-to-end learning approach to minimize the difference between the semantic segmentation extracted directly from the ground image and the semantic segmentation predicted solely based on the aerial image.We show that a model learned using this strategy, with no additional training, is already capable of rough semantic labeling of aerial imagery. Furthermore, we demonstrate that by finetuning this model we can achieve more accurate semantic segmentation than two baseline initialization strategies. We use our network to address the task of estimating the geolocation and geoorientation of a ground image. Finally, we show how features extracted from an aerial image can be used to hallucinate a plausible ground-level panorama.
We propose a novel method for detecting horizontal vanishing points and the zenith vanishing point in man-made environments. The dominant trend in existing methods is to first find candidate vanishing points, then remove outliers by enforcing mutual orthogonality. Our method reverses this process: we propose a set of horizon line candidates and score each based on the vanishing points it contains. A key element of our approach is the use of global image context, extracted with a deep convolutional network, to constrain the set of candidates under consideration. Our method does not make a Manhattan-world assumption and can operate effectively on scenes with only a single horizontal vanishing point. We evaluate our approach on three benchmark datasets and achieve state-ofthe-art performance on each. In addition, our approach is significantly faster than the previous best method.
Single image horizon line estimation is one of the most fundamental geometric problems in computer vision. Knowledge of the horizon line enables a wide variety of applications, including: image metrology, geometry-aware object detection, and automatic perspective correction. Despite this demonstrated importance, progress on this task has stagnated. We believe the lack of a suitably large and diverse evaluation dataset is the primary cause. Existing datasets [2,3] are often small and were created to focus on evaluating methods that use a particular geometric cue (e.g., orthogonal vanishing points). Methods that perform well on such datasets often perform poorly in real-world conditions.We introduce Horizon Lines in the Wild (HLW), a new dataset for single image horizon line estimation. HLW is several orders of magnitude larger than any existing dataset for horizon line detection (containing 100 553 images), has a much wider variety of scenes and camera perspectives, and wasn't constructed to highlight the value of any particular geometric cue. The dataset (including models and sample code) is available for download at our project website [1].Using HLW, we investigate methods for directly estimating the horizon line using convolutional neural networks (CNNs), including both classification and regression formulations. We focus on the GoogleNet architecture and explore the impact of design and implementation choices on the accuracy of the resulting model. Additionally, we propose two post-processing strategies for aggregating horizon line estimates across subwindows.Our approach is fast, works in natural and man-made environments, does not fail catastrophically when vanishing point detection is difficult, and outperforms all existing methods on the challenging real-world imagery contained in HLW. Further, when combined with the recent method by Zhai et al. [4], which uses a CNN to provide global context for vanishing point esti- mation, we obtain state-of-the-art results on two existing benchmark datasets [2,3]. Our main contributions are: 1) a novel approach for using structure from motion to automatically label images with a horizon line, 2) a large evaluation dataset of images with labeled horizon lines, 3) a CNN-based approach for directly estimating the horizon line in a single image, and 4) an extensive evaluation of a variety of CNN design choices.[1] Horizon Lines in The Wild project website.
Given a single RGB image of a complex outdoor road scene in the perspective view, we address the novel problem of estimating an occlusion-reasoned semantic scene layout in the top-view. This challenging problem not only requires an accurate understanding of both the 3D geometry and the semantics of the visible scene, but also of occluded areas. We propose a convolutional neural network that learns to predict occluded portions of the scene layout by looking around foreground objects like cars or pedestrians. But instead of hallucinating RGB values, we show that directly predicting the semantics and depths in the occluded areas enables a better transformation into the top-view. We further show that this initial top-view representation can be significantly enhanced by learning priors and rules about typical road layouts from simulated or, if available, map data. Crucially, training our model does not require costly or subjective human annotations for occluded areas or the topview, but rather uses readily available annotations for standard semantic segmentation. We extensively evaluate and analyze our approach on the KITTI and Cityscapes data sets.
We propose a novel convolutional neural network architecture for estimating geospatial functions such as population density, land cover, or land use. In our approach, we combine overhead and ground-level images in an end-toend trainable neural network, which uses kernel regression and density estimation to convert features extracted from the ground-level images into a dense feature map. The output of this network is a dense estimate of the geospatial function in the form of a pixel-level labeling of the overhead image. To evaluate our approach, we created a large dataset of overhead and ground-level images from a major urban area with three sets of labels: land use, building function, and building age. We find that our approach is more accurate for all tasks, in some cases dramatically so.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.