This paper addresses the task of estimating spatially-varying reflectance (i.e., SVBRDF) from a single, casually captured image. Central to our method is a highlight-aware (HA) convolution operation and a two-stream neural network equipped with proper training losses. Our HA convolution, as a novel variant of standard (ST) convolution, directly modulates convolution kernels under the guidance of automatically learned masks representing potentially overexposed highlight regions. It helps to reduce the impact of strong specular highlights on diffuse components and at the same time, hallucinates plausible contents in saturated regions. Considering that variation of saturated pixels also contains important cues for inferring surface bumpiness and specular components, we design a two-stream network to extract features from two different branches stacked by HA convolutions and ST convolutions, respectively. These two groups of features are further fused in an attention-based manner to facilitate feature selection of each SVBRDF map. The whole network is trained end to end with a new perceptual adversarial loss which is particularly useful for enhancing the texture details. Such a design also allows the recovered material maps to be disentangled. We demonstrate through quantitative analysis and qualitative visualization that the proposed method is effective to recover clear SVBRDFs from a single casually captured image, and performs favorably against state-of-the-arts. Since we impose very few constraints on the capture process, even a non-expert user can create high-quality SVBRDFs that cater to many graphical applications.
In this paper, we propose a learning-based method for predicting dense depth values of a scene from a monocular omnidirectional image. An omnidirectional image has a full field-of-view, providing much more complete descriptions of the scene than perspective images. However, fullyconvolutional networks that most current solutions rely on fail to capture rich global contexts from the panorama. To address this issue and also the distortion of equirectangular projection in the panorama, we propose Cubemap Vision Transformers (CViT), a new transformer-based architecture that can model long-range dependencies and extract distortion-free global features from the panorama. We show that cubemap vision transformers have a global receptive field at every stage and can provide globally coherent predictions for spherical signals. To preserve important local features, we further design a convolution-based branch in our pipeline (dubbed GLPanoDepth) and fuse global features from cubemap vision transformers at multiple scales. This global-to-local strategy allows us to fully exploit useful global and local features in the panorama, achieving stateof-the-art performance in panoramic depth estimation.
Existing convolutional neural networks have achieved great success in recovering SVBRDF maps from a single image. However, they mainly focus on handling low-resolution (e.g., 256 × 256) inputs. Ultra-high resolution (UHR) material maps are notoriously difficult to acquire by existing networks because: 1) finite computational resources set bounds for input receptive fields and output resolutions; 2) convolutional layers operate locally and lack the ability to capture long-range structural dependencies in UHR images. We propose an implicit neural reflectance model and a divide-and-conquer solution to address these two challenges simultaneously. We first crop an UHR image into low-resolution patches, each of which are processed by a local feature extractor (LFE) to extract important details. To fully exploit long-range spatial dependency and ensure global coherency, we incorporate a global feature extractor (GFE) and several coordinate-aware feature assembly (CAFA) modules into our pipeline. The GFE contains several lightweight material vision transformers that have a global receptive field at each scale and have the ability to infer long-term relationships in the material. After decoding globally coherent feature maps assembled by CAFA modules, the proposed end-to-end method is able to generate UHR SVBRDF maps from a single image with fine spatial details and consistent global structures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.