In autonomous, as well as manually operated vehicles, monitoring the driver visual attention provides useful information about the behavior, intent and vigilance level of the driver. The gaze of the driver can be formulated in terms of a probabilistic visual map representing the region around which the driver's attention is focused. The area of the estimated region changes based on the level of confidence of the estimation. This paper proposes a framework based on convolutional neural networks (CNNs) that takes the head pose and the eye appearance of the driver as inputs, and creates a fusion model that estimates the driver's gaze on a 2D grid. The model contains upsampling layers to create estimations at multiple resolutions. The model is trained using data collected from 59 subjects with continuous recordings where the subject looks at a moving target in a parked car, and glances at a set of markers inside the car while driving the vehicle and while the car is parked. Our fusion framework provides superior performance than unimodal systems trained exclusively with head pose or eye appearance information. It estimates the gaze region with the target location lying within the 75% confidence region with an accuracy of 92.54%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.