We are continuously surrounded by a noisy and ever-changing environment. Instead of analyzing all the elements in a scene, our visual system has the ability to compress an enormous amount of visual information into ensemble representations, such as perceiving a forest instead of every single tree. Still, it is unclear why such complex scenes appear to be the same from moment to moment despite fluctuations, noise, and discontinuities in retinal images. The general effects of change blindness are usually thought to stabilize scene perception, making us unaware of minor inconsistencies between scenes. Here, we propose an alternative, that stable scene perception is actively achieved by the visual system through global serial dependencies: the appearance of scene gist is sequentially dependent on the gist perceived in previous moments. To test this hypothesis, we used summary statistical information as a proxy for “gist” level, global information in a scene. We found evidence for serial dependence in summary statistical representations. Furthermore, we show that this kind of serial dependence occurs at the ensemble level, where local elements are already merged into global representations. Taken together, our results provide a mechanism through which serial dependence can promote the apparent consistency of scenes over time.
Because the environment is cluttered, objects rarely appear in isolation. The visual system must therefore attentionally select behaviorally relevant objects from among many irrelevant ones. A limit on our ability to select individual objects is revealed by the phenomenon of visual crowding: an object seen in the periphery, easily recognized in isolation, can become impossible to identify when surrounded by other, similar objects. The neural basis of crowding is hotly debated: while prevailing theories hold that crowded information is irrecoverable – destroyed due to over-integration in early stage visual processing – recent evidence demonstrates otherwise. Crowding can occur between high-level, configural object representations, and crowded objects can contribute with high precision to judgments about the “gist” of a group of objects, even when they are individually unrecognizable. While existing models can account for the basic diagnostic criteria of crowding (e.g., specific critical spacing, spatial anisotropies, and temporal tuning), no present model explains how crowding can operate simultaneously at multiple levels in the visual processing hierarchy, including at the level of whole objects. Here, we present a new model of visual crowding—the hierarchical sparse selection (HSS) model, which accounts for object-level crowding, as well as a number of puzzling findings in the recent literature. Counter to existing theories, we posit that crowding occurs not due to degraded visual representations in the brain, but due to impoverished sampling of visual representations for the sake of perception. The HSS model unifies findings from a disparate array of visual crowding studies and makes testable predictions about how information in crowded scenes can be accessed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.