Estimating local surface orientation (slant and tilt) is fundamental to recovering the three-dimensional structure of the environment. It is unknown how well humans perform this task in natural scenes. Here, with a database of natural stereo-images having groundtruth surface orientation at each pixel, we find dramatic differences in human tilt estimation with natural and artificial stimuli. Estimates are precise and unbiased with artificial stimuli and imprecise and strongly biased with natural stimuli. An image-computable Bayes optimal model grounded in natural scene statistics predicts human bias, precision, and trial-by-trial errors without fitting parameters to the human data. The similarities between human and model performance suggest that the complex human performance patterns with natural stimuli are lawful, and that human visual systems have internalized local image and scene statistics to optimally infer the three-dimensional structure of the environment. These results generalize our understanding of vision from the lab to the real world.
We present a unified Bayesian approach to shape representation and related problems in perceptual organization, including part decomposition, shape similarity, figure/ground estimation, and 3D shape. The approach is based on the idea of estimating the skeletal structure most likely to have generated the observed shape via a process of stochastic "growth." We survey the approach briefly and show how it can be extended in a principled way to solve a wide array of related problems. Shape and perceptual organizationThe visual representation of shape is a complex problem, requiring the reduction of an essentially infinite-dimensional object (the geometry of the shape) to a few perceptually meaningful dimensions. Human infants can recognize shape from line drawings without any prior experience [17], suggesting that the ability to abstract form from the bounding contour is innate. Much research in the study of shape has involved a quest for a set of shape descriptors that will allow just the right aspects of shape to be extracted-a representation that retains enough information to support recognition, shape similarity, and other key functions. [8], and so forth-has merits. Some have compelling mathematical motivations, while others (unfortunately not usually the same ones) have demonstrable agreement with human data. Still, broadly speaking, a complete computational characterization of human shape representation remains elusive.
Estimating local surface orientation (slant and tilt) is fundamental to recovering the threedimensional structure of the environment, but it is unknown how well humans perform this task in natural scenes. Here, with a high-fidelity database of natural stereo-images with groundtruth surface orientation at each pixel, we find dramatic differences in human tilt estimation with natural and artificial stimuli. With artificial stimuli, estimates are precise and unbiased. With natural stimuli, estimates are imprecise and strongly biased. An image-computable normative model grounded in natural scene statistics predicts human bias, precision, and trial-by-trial errors without fitting parameters to the human data. These similarities suggest that the complex human performance patterns with natural stimuli are lawful, and that human visual systems have internalized local image and scene statistics to optimally infer the three-dimensional structure of the environment. The current results help generalize our understanding of human vision from the lab to the real world.
Visual systems estimate the three-dimensional (3D) structure of scenes from information in two-dimensional (2D) retinal images. Visual systems use multiple sources of information to improve the accuracy of these estimates, including statistical knowledge of the probable spatial arrangements of natural scenes. Here, we examine how 3D surface tilts are spatially related in real-world scenes, and show that humans pool information across space when estimating surface tilt in accordance with these spatial relationships. We develop a hierarchical model of surface tilt estimation that is grounded in the statistics of tilt in natural scenes and images. The model computes a global tilt estimate by pooling local tilt estimates within an adaptive spatial neighborhood. The spatial neighborhood in which local estimates are pooled changes according to the value of the local estimate at a target location. The hierarchical model provides more accurate estimates of groundtruth tilt in natural scenes and provides a better account of human performance than the local estimates. Taken together, the results imply that the human visual system pools information about surface tilt across space in accordance with natural scene statistics.
Visual systems estimate the three-dimensional (3D) structure of scenes from information in twodimensional (2D) retinal images. Visual systems use multiple sources of information to improve the accuracy of these estimates, including statistical knowledge of the probable spatial arrangements of natural scenes. Here, we examine how 3D surface tilts are spatially related in real-world scenes, and show that humans pool information across space when estimating surface tilt in accordance with these spatial relationships. We develop a hierarchical model of surface tilt estimation that is grounded in the statistics of tilt in natural scenes and images. The model computes a global tilt estimate by pooling local tilt estimates within an adaptive spatial neighborhood. The spatial neighborhood in which local estimates are pooled changes according to the value of the local estimate at a target location. The hierarchical model provides more accurate estimates of groundtruth tilt in natural scenes and provides a better account of human performance than the local model. Taken together, the results imply that the human visual system pools information about surface tilt across space in accordance with natural scene statistics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.