Your understanding of what you see now surely influences what you will look at next. Yet this simple concept has only recently begun to be systematically studied and elaborated within theoretical frameworks. The Scene Perception & Event Comprehension Theory (SPECT) distinguishes between front-end and back-end processes that occur while viewers perceive and comprehend dynamic real-world events. Front-end processes occur during each eye fixation (information extraction, attentional selection) and back-end processes occur in memory (the current event model, prior knowledge, and executive processes). We begin with a selective review of the scene perception literature on bottom-up and top-down effects on attention in scenes, and highlight unanswered questions regarding the impact of the viewer's event model–their understanding of what is happening now. Then, we outline the SPECT theoretical framework, and review empirical evidence about how the viewer's current event model influences attentional selection. This influence is contrasted with those of visual saliency (e.g., color, brightness, motion, etc.) and task-driven control (i.e., goal setting, attentional control, inhibition). From this review, we specify a hierarchy of factors affecting attentional selection, in the order of task-driven control, visual saliency, and event models. We then propose several mechanisms by which the viewer’s event model influences attentional selection, and propose a systematic approach to investigating how that happens while watching dynamic scenes.