ABSTRACT:We perceive the shapes and material properties of objects quickly and reliably despite the complexity and objective ambiguities of natural images. Typical images are highly complex because they consist of many objects embedded in background clutter. Moreover, the image features of an object are extremely variable and ambiguous due to the effects of projection, occlusion, background clutter, and illumination. The very success of everyday vision implies neural mechanisms, yet to be understood, that discount irrelevant information and organize ambiguous or "noisy" local image features into objects and surfaces. Recent work in Bayesian theories of visual perception has shown how complexity may be managed and ambiguity resolved through the task-dependent, probabilistic integration of prior object knowledge with image features. CONTENTS
We argue that the study of human vision should be aimed at determining how humans perform natural tasks on natural images. Attempts to understand the phenomenology of vision from artificial stimuli, though worthwhile as a starting point, risk leading to faulty generalizations about visual systems. In view of the enormous complexity of natural images, they are similar to trying to evaluate the performance of a soldier in battle from his ability at playing with a water pistol. Dealing with this complexity is daunting, but Bayesian inference on structured probability distributions offers the ability to design theories of vision that can deal with the complexity of natural images and which use analysis by synthesis strategies with intriguing similarities to the brain.
Visual perception involves the grouping of individual elements into coherent patterns that reduce the descriptive complexity of a visual scene. The physiological basis of this perceptual simplification remains poorly understood. We used functional MRI to measure activity in a higher object processing area, the lateral occipital complex, and in primary visual cortex in response to visual elements that were either grouped into objects or randomly arranged. We observed significant activity increases in the lateral occipital complex and concurrent reductions of activity in primary visual cortex when elements formed coherent shapes, suggesting that activity in early visual areas is reduced as a result of grouping processes performed in higher areas. These findings are consistent with predictive coding models of vision that postulate that inferences of high-level areas are subtracted from incoming sensory information in lower areas through cortical feedback.O ne of the extraordinary capabilities of the human visual system is its ability to rapidly group elements in a complex visual scene, a process that can greatly simplify the description of an image. For example, a collection of parallel lines can be described as a single texture pattern without specifying the location, length, and orientation of each element within the pattern. Such grouping processes are reflected in the activities of neurons at various stages of the visual system. For example, the response of a neuron in primary visual cortex (V1) to a single visual element can be suppressed if the element in its receptive field shares the same orientation as surrounding elements, or enhanced if orientations differ (1). These pattern context effects in V1 are thought to be mediated by both local connections (2) and interactions with higher areas (3).In natural scenes, elements are often grouped when they are perceived as belonging to the same object. This case is particularly interesting from a physiological perspective because object shape is a feature that is represented only in higher stages of the visual system, so any influence of perceived shape on lower areas would require feedback processes. Although feedback is generally thought of as a process where activity in lower areas is enhanced by activity occurring in higher areas, recent work on probabilistic models has pointed to the importance of a phenomenon termed ''explaining away'': a competition that occurs between alternative hypotheses when attempting to infer the probable cause of an event (4). When applied to models of visual perception, perceptual hypotheses are thought to compete via feedback connections from higher visual areas projecting their predictions about the stimulus to lower stages, where they are then subtracted from incoming data. According to such predictive coding models, the activity of neurons in lower stages will decrease when neurons in higher stages can ''explain'' a visual stimulus (5, 6). These models can be contrasted with traditional feature-detection models, which posit that...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.