2022
DOI: 10.1109/tits.2021.3055120
|View full text |Cite
|
Sign up to set email alerts
|

HammerDrive: A Task-Aware Driving Visual Attention Model

Abstract: We introduce HammerDrive, a novel architecture for task-aware visual attention prediction in driving. The proposed architecture is learnable from data and can reliably infer the current focus of attention of the driver in real-time, while only requiring limited and easy-to-access telemetry data from the vehicle. We build the proposed architecture on two core concepts: 1) driving can be modeled as a collection of sub-tasks (maneuvers), and 2) each sub-task affects the way a driver allocates visual attention res… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 60 publications
0
8
0
Order By: Relevance
“…For example, [188], [189] use vanishing point because drivers often focus on it to get an optimal view of the road ahead [25], [193]. Other options include drivers' actions [190]- [192], current driving task [194], [195], previous fixation locations [190]- [192], and vehicle telemetry [190]- [192], [196]. Other common features include semantic segmentation maps [197]- [199] and detected objects [196], [200]- [202].…”
Section: ) Bottom-up and Top-down Influencesmentioning
confidence: 99%
See 3 more Smart Citations
“…For example, [188], [189] use vanishing point because drivers often focus on it to get an optimal view of the road ahead [25], [193]. Other options include drivers' actions [190]- [192], current driving task [194], [195], previous fixation locations [190]- [192], and vehicle telemetry [190]- [192], [196]. Other common features include semantic segmentation maps [197]- [199] and detected objects [196], [200]- [202].…”
Section: ) Bottom-up and Top-down Influencesmentioning
confidence: 99%
“…For instance, optical flow is useful for identifying the direction and magnitude of motion in the scene [198], [203]- [205]. More recently, following successful applications in video action recognition problems [206], 3D convolutional networks have become a popular choice for encoding spatiotemporal data [194], [199], [202], [207]. Some approaches use recurrent networks, combined with individually encoded frames [208], [209] or with a set of frames processed via 3D convolutional layers [197].…”
Section: ) Bottom-up and Top-down Influencesmentioning
confidence: 99%
See 2 more Smart Citations
“…They found that fixations are geared towards obtaining the most valuable task-specific information and thus the subject may not necessarily fixate on the most salient regions in a scene [10,11]. The effect of task on visual attention has given rise to task-driven attention models, which have shown to be effective for visual attention prediction [2,8]. For example, Gao et al present the new problem of Object Importance Estimation, and propose a framework composed of a visual model and a goal model to directly incorporate the effect of driving goal into the task at hand.…”
Section: Task-driven Attentionmentioning
confidence: 99%