Proceedings of the 2020 International Conference on Multimodal Interaction 2020
DOI: 10.1145/3382507.3417967
|View full text |Cite
|
Sign up to set email alerts
|

X-AWARE: ConteXt-AWARE Human-Environment Attention Fusion for Driver Gaze Prediction in the Wild

Abstract: Reliable systems for automatic estimation of the driver's gaze are crucial for reducing the number of traffic fatalities and for many emerging research areas aimed at developing intelligent vehiclepassenger systems. Gaze estimation is a challenging task, especially in environments with varying illumination and reflection properties. Furthermore, there is wide diversity with respect to the appearance of drivers' faces, both in terms of occlusions (e. g., vision aids) and cultural/ethnic backgrounds. For this re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(8 citation statements)
references
References 49 publications
0
8
0
Order By: Relevance
“…The integration of sensors enables AR systems to understand users' current states [99,194,204] and their environment [139,153] to provide a variety of intelligent functionalities. For example, AR could infer user intent [14] and provide contextual recommendations for daily activities (e.g., recipe recommendations when a user opens the fridge during lunch) [15,118,122].…”
Section: 21mentioning
confidence: 99%
See 1 more Smart Citation
“…The integration of sensors enables AR systems to understand users' current states [99,194,204] and their environment [139,153] to provide a variety of intelligent functionalities. For example, AR could infer user intent [14] and provide contextual recommendations for daily activities (e.g., recipe recommendations when a user opens the fridge during lunch) [15,118,122].…”
Section: 21mentioning
confidence: 99%
“…User State. The sensors that could be integrated within future HMDs would empower an AR system to have a rich, instant understanding of user's state, such as activities (IMU [86,219], camera [80,128,194,201], microphone [103,218,229,230]), cognitive load (eye tracking [71,104,238], EEG [20,224]), attention (eye tracking [56,99,204,231], IMU [123], EEG [213]), emotion (facial tracking [233,236], EEG [202,216]) and potential intent (the fusion of multiple sensors and low-level intelligence [14,111,211]). Depending on a user's state, the design of explanations could be different.…”
Section: Key Factorsmentioning
confidence: 99%
“…A gaze estimation model called X-Aware is introduced in [7] to analyze the driver's face along with contextual information. The model visually improves the fusion of the captured environment of the driver's face, where the contextual attention mechanism is directly attached to the output of convolutional layers of the InceptionResNetV2 networks.…”
Section: Single-based Deep Learning Modelsmentioning
confidence: 99%
“…Deep learning models have the advantage of combining feature extraction and classification steps. Instead of the explicit processing pipeline described above, a single convolutional neural network (CNN) pre-trained on the image classification task can be used to classify cropped frames from driver-facing videos [80]- [83]. These CNN-based models reach high accuracy and can discriminate adjacent areas better than previous methods that relied on hand-crafted features.…”
Section: A In-vehicle Gaze Estimationmentioning
confidence: 99%