Human behavioral experiments have led to influential conceptualizations of visual attention, such as a serial processor or a limited resource spotlight. There is growing evidence that simpler organisms such as insects show behavioral signatures associated with human attention. Can those organisms learn such capabilities without conceptualizations of human attention? We show that a feedforward convolutional neural network (CNN) with a few million neurons trained on noisy images to detect targets learns to utilize predictive cues and context. We demonstrate that the CNN predicts human performance and gives rise to the three most prominent behavioral signatures of covert attention: Posner cueing, set-size effects in search, and contextual cueing. The CNN also approximates an ideal Bayesian observer that has all prior knowledge about the statistical properties of the noise, targets, cues, and context. The results help understand how even simple biological organisms show human-like visual attention by implementing neurobiologically plausible simple computations.
Gaze direction is an evolutionarily important mechanism in daily social interactions. It reflects a person’s internal cognitive state, spatial locus of interest, and predicts future actions. Studies have used static head images presented foveally and simple synthetic tasks to find that gaze orients attention facilitates target detection at the cued location in a sustained manner. Little is known about how people’s natural gaze behavior, including eyes, head, and body movements, jointly orient covert attention, microsaccades, and facilitate performance in more ecological dynamic scenes. Participants completed a target person detection task with videos of real scenes. The videos showed people looking toward (valid cue) or away from a target (invalid cue) location. We digitally manipulated the individuals in the videos directing gaze to create three conditions: intact (head+body movements), floating heads (only head movements), and headless bodies (only body movements). We assessed their impact on participants’ behavioral performance and microsaccades during the task. We show that, in isolation, an individual’s head or body orienting toward the target-person direction led to facilitation in detection that is transient in time (200 ms). In contrast, only whole silhouettes led to sustained facilitation (500 ms). Furthermore, observers executed microsaccades more frequently towards the cued direction for valid trials, but this bias was sustained in time only when full silhouettes were present. Together, the results differ from previous findings with foveally presented static heads. In more real-world scenarios and tasks, sustained attention requires the presence of the whole silhouettes of the individuals dynamically directing their gaze.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.