When intelligent agents learn visuomotor behaviors from human demonstrations, they may benefit from knowing where the human is allocating visual attention, which can be inferred from their gaze. A wealth of information regarding intelligent decision making is conveyed by human gaze allocation; hence, exploiting such information has the potential to improve the agents' performance. With this motivation, we propose the AGIL (Attention Guided Imitation Learning) framework. We collect high-quality human action and gaze data while playing Atari games in a carefully controlled experimental setting. Using these data, we first train a deep neural network that can predict human gaze positions and visual attention with high accuracy (the gaze network) and then train another network to predict human actions (the policy network). Incorporating the learned attention model from the gaze network into the policy network significantly improves the action prediction accuracy and task performance.
Here we examine the ways that the optic flow patterns experienced during natural locomotion are shaped by the movement of the observer through their environments. By recording body motion during locomotion in natural terrain, we demonstrate that head-centered optic flow is highly unstable regardless of whether the walker’s head (and eye) is directed towards a distant target or at the ground nearby to monitor foothold selection. In contrast, VOR-mediated retinal optic flow has stable, reliable features that may be valuable for the control of locomotion. In particular, we found that a walker can determine whether they will pass to the left or right of their fixation point by observing the sign and magnitude of the curl of the flow field at the fovea. In addition, the divergence map of the retinal flow field provides a cue for the walker’s overground velocity/momentum vector in retinotopic coordinates, which may be an essential part of the visual identification of footholds during locomotion over complex terrain. These findings casts doubt on the assumption that accurate perception of heading direction requires correction for the effects of eccentric gaze, as has long been assumed. The present analysis of retinal flow patterns during the gait cycle suggests an alternative interpretation of the way flow is used for both perception of heading and the control of locomotion in the natural world.
We examine the structure of the visual motion projected on the retina during natural locomotion in real world environments. Bipedal gait generates a complex, rhythmic pattern of head translation and rotation in space, so without gaze stabilization mechanisms such as the vestibular-ocular-reflex (VOR) a walker’s visually specified heading would vary dramatically throughout the gait cycle. The act of fixation on stable points in the environment nulls image motion at the fovea, resulting in stable patterns of outflow on the retinae centered on the point of fixation. These outflowing patterns retain a higher order structure that is informative about the stabilized trajectory of the eye through space. We measure this structure by applying the curl and divergence operations on the retinal flow velocity vector fields and found features that may be valuable for the control of locomotion. In particular, the sign and magnitude of foveal curl in retinal flow specifies the body’s trajectory relative to the gaze point, while the point of maximum divergence in the retinal flow field specifies the walker’s instantaneous overground velocity/momentum vector in retinotopic coordinates. Assuming that walkers can determine the body position relative to gaze direction, these time-varying retinotopic cues for the body’s momentum could provide a visual control signal for locomotion over complex terrain. In contrast, the temporal variation of the eye-movement-free, head-centered flow fields is large enough to be problematic for use in steering towards a goal. Consideration of optic flow in the context of real-world locomotion therefore suggests a re-evaluation of the role of optic flow in the control of action during natural behavior.
Large-scale public datasets have been shown to benefit research in multiple areas of modern artificial intelligence. For decision-making research that requires human data, high-quality datasets serve as important benchmarks to facilitate the development of new methods by providing a common reproducible standard. Many human decision-making tasks require visual attention to obtain high levels of performance. Therefore, measuring eye movements can provide a rich source of information about the strategies that humans use to solve decision-making tasks. Here, we provide a large-scale, high-quality dataset of human actions with simultaneously recorded eye movements while humans play Atari video games. The dataset consists of 117 hours of gameplay data from a diverse set of 20 games, with 8 million action demonstrations and 328 million gaze samples. We introduce a novel form of gameplay, in which the human plays in a semi-frame-by-frame manner. This leads to near-optimal game decisions and game scores that are comparable or better than known human records. We demonstrate the usefulness of the dataset through two simple applications: predicting human gaze and imitating human demonstrated actions. The quality of the data leads to promising results in both tasks. Moreover, using a learned human gaze model to inform imitation learning leads to an 115% increase in game performance. We interpret these results as highlighting the importance of incorporating human visual attention in models of decision making and demonstrating the value of the current dataset to the research community. We hope that the scale and quality of this dataset can provide more opportunities to researchers in the areas of visual attention, imitation learning, and reinforcement learning.
Large-scale public datasets have been shown to benefit research in multiple areas of modern artificial intelligence. For decision-making research that requires human data, highquality datasets serve as important benchmarks to facilitate the development of new methods by providing a common reproducible standard. Many human decision-making tasks require visual attention to obtain high levels of performance. Therefore, measuring eye movements can provide a rich source of information about the strategies that humans use to solve decision-making tasks. Here, we provide a largescale, high-quality dataset of human actions with simultaneously recorded eye movements while humans play Atari video games. The dataset consists of 117 hours of gameplay data from a diverse set of 20 games, with 8 million action demonstrations and 328 million gaze samples. We introduce a novel form of gameplay, in which the human plays in a semiframe-by-frame manner. This leads to near-optimal game decisions and game scores that are comparable or better than known human records. We demonstrate the usefulness of the dataset through two simple applications: predicting human gaze and imitating human demonstrated actions. The quality of the data leads to promising results in both tasks. Moreover, using a learned human gaze model to inform imitation learning leads to an 115% increase in game performance. We interpret these results as highlighting the importance of incorporating human visual attention in models of decision making and demonstrating the value of the current dataset to the research community. We hope that the scale and quality of this dataset can provide more opportunities to researchers in the areas of visual attention, imitation learning, and reinforcement learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.