Where does one attend when viewing dynamic scenes? Research into the factors influencing gaze location during static scene viewing have reported that low-level visual features contribute very little to gaze location especially when opposed by high-level factors such as viewing task. However, the inclusion of transient features such as motion in dynamic scenes may result in a greater influence of visual features on gaze allocation and coordination of gaze across viewers. In the present study, we investigated the contribution of low-to mid-level visual features to gaze location during free-viewing of a large dataset of videos ranging in content and length. Signal detection analysis on visual features and Gaussian Mixture Models for clustering gaze was used to identify the contribution of visual features to gaze location. The results show that mid-level visual features including corners and orientations can distinguish between actual gaze locations and a randomly sampled baseline. However, temporal features such as flicker, motion, and their respective contrasts were the most predictive of gaze location. Additionally, moments in which all viewers' gaze tightly clustered in the same location could be predicted by motion. Motion and mid-level visual features may influence gaze allocation in dynamic scenes, but it is currently unclear whether this influence is involuntary or due to correlations with higher order factors such as scene semantics.
Does viewing task influence gaze during dynamic scene viewing? Research into the factors influencing gaze allocation during free viewing of dynamic scenes has reported that the gaze of multiple viewers clusters around points of high motion (attentional synchrony), suggesting that gaze may be primarily under exogenous control. However, the influence of viewing task on gaze behavior in static scenes and during real-world interaction has been widely demonstrated. To dissociate exogenous from endogenous factors during dynamic scene viewing we tracked participants' eye movements while they (a) freely watched unedited videos of real-world scenes (free viewing) or (b) quickly identified where the video was filmed (spot-the-location). Static scenes were also presented as controls for scene dynamics. Free viewing of dynamic scenes showed greater attentional synchrony, longer fixations, and more gaze to people and areas of high flicker compared with static scenes. These differences were minimized by the viewing task. In comparison with the free viewing of dynamic scenes, during the spot-the-location task fixation durations were shorter, saccade amplitudes were longer, and gaze exhibited less attentional synchrony and was biased away from areas of flicker and people. These results suggest that the viewing task can have a significant influence on gaze during a dynamic scene but that endogenous control is slow to kick in as initial saccades default toward the screen center, areas of high motion and people before shifting to task-relevant features. This default-like viewing behavior returns after the viewing task is completed, confirming that gaze behavior is more predictable during free viewing of dynamic than static scenes but that this may be due to natural correlation between regions of interest (e.g., people) and motion.
What controls gaze allocation during dynamic face perception? We monitored participants' eye movements while they watched videos featuring close-ups of pedestrians engaged in interviews. Contrary to previous findings using static displays, we observed no general preference to fixate eyes. Instead, gaze was dynamically directed to the eyes, nose, or mouth in response to the currently depicted event. Fixations to the eyes increased when a depicted face made eye contact with the camera, while fixations to the mouth increased when the face was speaking. When a face moved quickly, fixations concentrated on the nose, suggesting that it served as a spatial anchor. To better understand the influence of auditory speech during dynamic face perception, we presented participants with a second version of the same video, in which the audio speech track had been removed, leaving just the background music. Removing the speech signal modulated gaze allocation by decreasing fixations to faces generally and the mouth specifically. Since the task was to simply rate the likeability of the videos, the decrease of attention allocation to the mouth region implies a reduction of the functional benefits of mouth fixations given that speech comprehension was not required. Together, these results argue against a general prioritization of the eyes and support a more functional, information-seeking use of gaze allocation during dynamic face viewing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.