How we perceive the physical world is not only organized in terms of objects, but also structured in time as sequences of events. This is especially evident in intuitive physics, with temporally bounded dynamics such as falling, occlusion, and bouncing demarcating the continuous flow of sensory inputs. While the spatial structure and attentional consequences of physical objects have been well-studied, much less is known about the temporal structure and attentional consequences of physical events in visual perception. Previous work has recognized physical events as units in the mind, and used presegmented object interactions to explore physical representations. However, these studies did not address whether and how perception imposes the kind of temporal structure that carves these physical events to begin with, and the attentional consequences of such segmentation during intuitive physics. Here, we use performance-based tasks to address this gap. In Experiment 1, we find that perception not only spontaneously separates visual input in time into physical events, but also, this segmentation occurs in a nonlinear manner within a few hundred milliseconds at the moment of the event boundary. In Experiment 2, we find that event representations, once formed, use coarse "look ahead" simulations to selectively prioritize those objects that are predictively part of the unfolding dynamics. This rich temporal and predictive structure of physical event representations, formed during vision, should inform models of intuitive physics.
Public Significance StatementDespite the continuous flow of sensory inputs, our perceptual experiences are deeply structured in space in terms of objects and in time in terms of events. In many ways, most research in visual perception has focused on half of this structure: objects. To help reveal the structure of the other half, we turn to intuitive physics-our ability to see and predict how scenes react to forces-where events are especially evident. In intuitive physics, we do not just see objects with certain physical properties occupying certain places, but we experience dynamic patterns of interactions, for example, falling, colliding, bouncing, and entering and exiting containers. In two experiments with human observers, we synthesize methods and theories across the studies of attention, music cognition, seeing, and thinking, and find that the visual system spontaneously marks physical events at subsecond timescales and prioritizes objects relevant to the ongoing event.