Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze

Barz, Michael; Sonntag, Daniel

doi:10.3390/s21124143

Cited by 24 publications

(26 citation statements)

References 79 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The evaluation between different grids is shown in Tab. 4. "Pixel-level" refers to the evaluation of the saliency map using 𝐷 𝐾𝐿 and 𝐶𝐶 metrics.…”

Section: Quantitative Resultsmentioning

confidence: 99%

“…Previous works [20,53] set out to reduce tedious labelling by using gazeobject mapping, which annotates objects at the fixation level, i.e., the object being looked at. One popular algorithm checks whether a fixation lies in the object bounding box predicted by deep neural network-based object detector [4,21,29] such as YOLOv4 [5]. Wolf et al [53] suggest to use object segmentation using Mask-RCNN [12] as object area detection.…”

Section: :3mentioning

confidence: 99%

“…This algorithm from [3] can be used in Augmented Reality settings for cognition-aware mobile user interaction. In the follow-up work [4], the authors compare the mapping algorithms based on image cropping (IC) with object detectors (OD) in metrics such as precision and recall, and the results show that IC achieves higher precision but lower recall scores compared to OD.…”

Section: :3mentioning

confidence: 99%

“…Many computer vision applications embrace human gaze information, for instance in classification tasks [28,41], computer-aided medical diagnosis systems [16,42], or important objects selection/cropping in images and videos [43,44,50,52]. To better understand how the human brain processes visual stimuli, knowing not only where humans are looking at, but also what object is essential, i.e., gaze-object mapping [4]. This mapping is needed in many research projects, especially in analytics of student learning process [21] or human cognitive functions [35].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Where and What: Driver Attention-based Object Detection

Rong,

Kassautzki,

Fuhl

et al. 2022

Preprint

View full text Add to dashboard Cite

Human drivers use their attentional mechanisms to focus on critical objects and make decisions while driving. As human attention can be revealed from gaze data, capturing and analyzing gaze information has emerged in recent years to benefit autonomous driving technology. Previous works in this context have primarily aimed at predicting "where" human drivers look at and lack knowledge of "what" objects drivers focus on. Our work bridges the gap between pixel-level and object-level attention prediction. Specifically, we propose to integrate an attention prediction module into a pretrained object detection framework and predict the attention in a grid-based style. Furthermore, critical objects are recognized based on predicted attended-to areas. We evaluate our proposed method on two driver attention datasets, BDD-A and DR(eye)VE. Our framework achieves competitive state-of-the-art performance in the attention prediction on both pixel-level and object-level but is far more efficient (75.3 GFLOPs less) in computation.

show abstract

“…The evaluation between different grids is shown in Tab. 4. "Pixel-level" refers to the evaluation of the saliency map using 𝐷 𝐾𝐿 and 𝐶𝐶 metrics.…”

Section: Quantitative Resultsmentioning

confidence: 99%

Section: :3mentioning

confidence: 99%

Section: :3mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Where and What: Driver Attention-based Object Detection

Rong,

Kassautzki,

Fuhl

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Graphical information includes discrete points of focus, whereas; textual information includes continuous points of focus. Based on prior studies [8], [12], [27], [28], [45], we considered fixation duration of 200ms as a threshold to define a fixation event. Intuitively, the fixation counts also increase proportionally with fixation duration.…”

Section: ) Fixation Countmentioning

confidence: 99%

VIS-iTrack: Visual Intention Through Gaze Tracking Using Low-Cost Webcam

et al. 2022

View full text Add to dashboard Cite

Human intention is an internal, mental characterization for acquiring desired information. From interactive interfaces containing either textual or graphical information, intention to perceive desired information is subjective and strongly connected with eye gaze. In this work, we determine such intention by analyzing real-time eye gaze data with a low-cost regular webcam. We extracted unique features (e.g., Fixation Count, Eye Movement Ratio) from the eye gaze data of 31 participants to generate a dataset containing 124 samples of visual intention for perceiving textual or graphical information, labeled as either TEXT or IMAGE, having 48.39% and 51.61% distribution, respectively. Using this dataset, we analyzed 5 classifiers, including Support Vector Machine (SVM) (Accuracy: 92.19%). Using the trained SVM, we investigated the variation of visual intention among 30 participants, distributed in 3 age groups, and found out that young users were more leaned towards graphical contents whereas older adults felt more interested in textual ones. This finding suggests that real-time eye gaze data can be a potential source of identifying visual intention, analyzing which intention aware interactive interfaces can be designed and developed to facilitate human cognition.

show abstract