Recognizing materials in real-world images is a challenging task. Real-world materials have rich surface texture, geometry, lighting conditions, and clutter, which combine to make the problem particularly difficult. In this paper, we introduce a new, large-scale, open dataset of materials in the wild, the Materials in Context Database (MINC), and combine this dataset with deep learning to achieve material recognition and segmentation of images in the wild.MINC is an order of magnitude larger than previous material databases, while being more diverse and well-sampled across its 23 categories. Using MINC, we train convolutional neural networks (CNNs) for two tasks: classifying materials from patches, and simultaneous material recognition and segmentation in full images. For patch-based classification on MINC we found that the best performing CNN architectures can achieve 85.2% mean class accuracy. We convert these trained CNN classifiers into an efficient fully convolutional framework combined with a fully connected conditional random field (CRF) to predict the material at every pixel in an image, achieving 73.1% mean class accuracy. Our experiments demonstrate that having a large, well-sampled dataset such as MINC is crucial for real-world material recognition and segmentation.
We propose Deep Feature Interpolation (DFI), a new datadriven baseline for automatic high-resolution image transformation. As the name suggests, DFI relies only on simple linear interpolation of deep convolutional features from pre-trained convnets. We show that despite its simplicity, DFI can perform high-level semantic transformations like "make older/younger", "make bespectacled", "add smile", among others, surprisingly well-sometimes even matching or outperforming the state-of-the-art. This is particularly unexpected as DFI requires no specialized network architecture or even any deep network to be trained for these tasks. DFI therefore can be used as a new baseline to evaluate more complex algorithms and provides a practical answer to the question of which image transformation tasks are still challenging after the advent of deep learning.
Natural illumination from the sun and sky plays a significant role in the appearance of outdoor scenes. We propose the use of sophisticated outdoor illumination models, developed in the computer graphics community, for estimating appearance and timestamps from a large set of uncalibrated images of an outdoor scene. We first present an analysis of the relationship between these illumination models and the geolocation, time, surface orientation, and local visibility at a scene point. We then use this relationship to devise a data-driven method for estimating per-point albedo and local visibility information from a set of Internet photos taken under varying, unknown illuminations. Our approach significantly extends prior work on appearance estimation to work with sun-sky models, and enables new applications, such as computing timestamps for individual photos using shading information. Modeling illumination in outdoor scenesThe illumination arriving at a point in an outdoor scene depends on several key factors, including:• geographic location • time and date • surface orientation • local visibility Our model describes the irradiance incident at an outdoor scene point on a clear day as a function L(φ , λ ,t, α, n) where φ , λ are latitude and longitude, t is the time and date, n is the normal, and α is the local visibility angle. This angle α is a parameterization of local visibility based on a model of ambient occlusion proposed by Hauagge et al. [1], which models local geometry around a point as a cylindrical hole with angle α from the normal to the opening. Figure 1 shows examples of L, in the form of spheres rendered under predicted outdoor illumination at various times and α angles, at a given location on Earth. MethodA georegistered 3D point cloud built using SfM and MVS provides geographic location (φ , λ ), surface normals ( n), and a set of observed pixel values for each point (I x ). We first estimate the albedo of each point, then use the albedo to estimate lighting and capture time for each photo. Estimating Albedo. We adopt a simple Lambertian image formation model I x = ρL x where I x is the observed color of a point x in a given image I, ρ x is the (assumed constant) albedo at that point, and L x is the irradiance as defined above. Given many observations of a point I x , we derive the albedo ρ x by dividing the average observed colorOur key insight is that we can use a sun/sky model to predict illumination for a given condition, or indeed the average illumination for a given scene. For a given location, time, and visibility angle, we compute a physically-based environment map (we use the model of Hosek and Wilkie [2]) and, for each normal, integrate over the visible portion of the environment map to produce a database of spheres giving values for L at each normal direction, as illustrated in Figure 1(a-b). We then estimate expected illuminationL( n, α) as a function of normal and visibility angle by taking the average over a set of times sampled throughout the year.For each point x, we have a surface ...
Image datasets with high-quality pixel-level annotations are valuable for semantic segmentation: labelling every pixel in an image ensures that rare classes and small objects are annotated. However, full-image annotations are expensive, with experts spending up to 90 minutes per image. We propose block sub-image annotation as a replacement for full-image annotation. Despite the attention cost of frequent task switching, we find that block annotations can be crowdsourced at higher quality compared to full-image annotation with equal monetary cost using existing annotation tools developed for full-image annotation. Surprisingly, we find that 50% pixels annotated with blocks allows semantic segmentation to achieve equivalent performance to 100% pixels annotated. Furthermore, as little as 12% of pixels annotated allows performance as high as 98% of the performance with dense annotation. In weakly-supervised settings, block annotation outperforms existing methods by 3-4% (absolute) given equivalent annotation time. To recover the necessary global structure for applications such as characterizing spatial context and affordance relationships, we propose an effective method to inpaint block-annotated images with high-quality labels without additional human effort. As such, fewer annotations can also be used for these applications compared to full-image annotation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.