Figure 1: Samples from our generators trained on several challenging datasets (LSUN Churches, FFHQ, Landscapes, Satellite-Buildings, Satellite-Landscapes) at resolution 256 × 256. The images are generated without spatial convolutions, upsampling, or self-attention operations. No interaction between pixels takes place during inference.
We present a new method for lightweight novel-view synthesis that generalizes to an arbitrary forward-facing scene. Recent approaches are computationally expensive, require per-scene optimization, or produce a memory-expensive representation. We start by representing the scene with a set of fronto-parallel semitransparent planes and afterwards convert them to deformable layers in an end-to-end manner. Additionally, we employ a feed-forward refinement procedure that corrects the estimated representation by aggregating information from input views. Our method does not require any fine-tuning when a new scene is processed and can handle an arbitrary number of views without any restrictions. Experimental results show that our approach surpasses recent models in terms of both common metrics and human evaluation, with the noticeable advantage in inference speed and compactness of the inferred layered geometry.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.