Which region of the visual field is most useful for recognizing scene gist, central vision (the fovea and parafovea) based on its higher visual resolution and importance for object recognition, or the periphery, based on resolving lower spatial frequencies useful for scene gist recognition, and its large extent? Scenes were presented in two experimental conditions: a "Window," a circular region showing the central portion of a scene, and blocking peripheral information, or a "Scotoma," which blocks out the central portion of a scene and shows only the periphery. Results indicated the periphery was more useful than central vision for maximal performance (i.e., equal to seeing the entire image). Nevertheless, central vision was more efficient for scene gist recognition than the periphery on a per-pixel basis. A critical radius of 7.4 degrees was found where the Window and Scotoma performance curves crossed, producing equal performance. This value was compared to predicted critical radii from cortical magnification functions on the assumption that equal V1 activation would produce equal performance. However, these predictions were systematically smaller than the empirical critical radius, suggesting that the utility of central vision for gist recognition is less than predicted by V1 cortical magnification.
What is the relationship between film viewers’ eye movements and their film comprehension? Typical Hollywood movies induce strong attentional synchrony—most viewers look at the same things at the same time. Thus, we asked whether film viewers’ eye movements would differ based on their understanding—the mental model hypothesis—or whether any such differences would be overwhelmed by viewers’ attentional synchrony—the tyranny of film hypothesis. To investigate this question, we manipulated the presence/absence of prior film context and measured resulting differences in film comprehension and eye movements. Viewers watched a 12-second James Bond movie clip, ending just as a critical predictive inference should be drawn that Bond’s nemesis, “Jaws,” would fall from the sky onto a circus tent. The No-context condition saw only the 12-second clip, but the Context condition also saw the preceding 2.5 minutes of the movie before seeing the critical 12-second portion. Importantly, the Context condition viewers were more likely to draw the critical inference and were more likely to perceive coherence across the entire 6 shot sequence (as shown by event segmentation), indicating greater comprehension. Viewers’ eye movements showed strong attentional synchrony in both conditions as compared to a chance level baseline, but smaller differences between conditions. Specifically, the Context condition viewers showed slightly, but significantly, greater attentional synchrony and lower cognitive load (as shown by fixation probability) during the critical first circus tent shot. Thus, overall, the results were more consistent with the tyranny of film hypothesis than the mental model hypothesis. These results suggest the need for a theory that encompasses processes from the perception to the comprehension of film.
This study investigated the relative roles of visuospatial versus linguistic working memory (WM) systems in the online generation of bridging inferences while viewers comprehend visual narratives. We contrasted these relative roles in the visuospatial primacy hypothesis versus the shared (visuospatial & linguistic) systems hypothesis, and tested them in 3 experiments. Participants viewed picture stories containing multiple target episodes consisting of a beginning state, a bridging event, and an end state, respectively, and the presence of the bridging event was manipulated. When absent, viewers had to infer the bridging-event action to comprehend the end-state image. A pilot study showed that after viewing the end-state image, participants' think-aloud protocols contained more inferred actions when the bridging event was absent than when it was present. Likewise, Experiment 1 found longer viewing times for the end-state image when the bridging-event image was absent, consistent with viewing times revealing online inference generation processes. Experiment 2 showed that both linguistic and visuospatial WM loads attenuated the inference viewing time effect, consistent with the shared systems hypothesis. Importantly, however, Experiment 3 found that articulatory suppression did not attenuate the inference viewing time effect, indicating that (sub)vocalization did not support online inference generation during visual narrative comprehension. Thus, the results support a shared-systems hypothesis in which both visuospatial and linguistic WM systems support inference generation in visual narratives, with the linguistic WM system operating at a deeper level than (sub)vocalization.
Understanding how people comprehend visual narratives (including picture stories, comics, and film) requires the combination of traditionally separate theories that span the initial sensory and perceptual processing of complex visual scenes, the perception of events over time, and comprehension of narratives. Existing piecemeal approaches fail to capture the interplay between these levels of processing. Here, we propose the Scene Perception & Event Comprehension Theory (SPECT), as applied to visual narratives, which distinguishes between front‐end and back‐end cognitive processes. Front‐end processes occur during single eye fixations and are comprised of attentional selection and information extraction. Back‐end processes occur across multiple fixations and support the construction of event models, which reflect understanding of what is happening now in a narrative (stored in working memory) and over the course of the entire narrative (stored in long‐term episodic memory). We describe relationships between front‐ and back‐end processes, and medium‐specific differences that likely produce variation in front‐end and back‐end processes across media (e.g., picture stories vs. film). We describe several novel research questions derived from SPECT that we have explored. By addressing these questions, we provide greater insight into how attention, information extraction, and event model processes are dynamically coordinated to perceive and understand complex naturalistic visual events in narratives and the real world.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.