End-to-End Human-Gaze-Target Detection with Transformers

Tu, Danyang; Min, Xiongkuo; Duan, Huiyu; Guo, Guodong; Zhai, Guangtao; Shen, Wei

doi:10.1109/cvpr52688.2022.00224

Cited by 46 publications

(6 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Gaze Target Detection Methods Furthermore, we conduct comparisons with five recent methods: Chen (Chen et al 2021), Fang (Fang et al 2021), Tu (Tu et al 2022), Bao (Bao, Liu, and Yu 2022), and Miao (Miao, Hoai, and Samaras 2023). These methods have all demonstrated notable performance within the confines of within-dataset evaluations.…”

Section: Comparison Methodsmentioning

confidence: 99%

“…For our method, we use the pre-trained lightweight body pose estimator RTMPose (Jiang et al 2023) and object detector YOLOv3 (Redmon et al 2016). On the other hand, competing methods introduced some other modules, e.g., face detection and depth estimation from the scene (Fang et al 2021), body pose estimation and 3D reconstruction from the scene (Bao, Liu, and Yu 2022), ViT backbone (Tu et al 2022). In order to measure their computation complexity, we also select recent high-speed implementations for them, and compared their inference speed on a single NVIDIA Titan XP GPU.…”

Section: Computational Complexitymentioning

confidence: 99%

See 1 more Smart Citation

Gaze Target Detection by Merging Human Attention and Activity Cues

Yang,

Yin,

2024

AAAI

View full text Add to dashboard Cite

Despite achieving impressive performance, current methods for detecting gaze targets, which depend on visual saliency and spatial scene geometry, continue to face challenges when it comes to detecting gaze targets within intricate image backgrounds. One of the primary reasons for this lies in the oversight of the intricate connection between human attention and activity cues. In this study, we introduce an innovative approach that amalgamates the visual saliency detection with the body-part & object interaction both guided by the soft gaze attention. This fusion enables precise and dependable detection of gaze targets amidst intricate image backgrounds. Our approach attains state-of-the-art performance on both the Gazefollow benchmark and the GazeVideoAttn benchmark. In comparison to recent methods that rely on intricate 3D reconstruction of a single input image, our approach, which solely leverages 2D image information, still exhibits a substantial lead across all evaluation metrics, positioning it closer to human-level performance. These outcomes underscore the potent effectiveness of our proposed method in the gaze target detection task.

show abstract

Section: Comparison Methodsmentioning

confidence: 99%

Section: Computational Complexitymentioning

confidence: 99%

Gaze Target Detection by Merging Human Attention and Activity Cues

Yang,

Yin,

2024

AAAI

View full text Add to dashboard Cite

show abstract

“…The main reason could be that the spatio-temporal transformer is trained with noisy gaze cues as the VidHOI dataset lacks ground-truth gaze annotations. The performance of the adopted gaze following model (Chong et al, 2020) might be a limitation of our framework, but could be improved by leveraging more recent works in that field, such as (Tu et al, 2022a;Fang et al, 2021). In addition, even though the gaze does not result in big improvement, other extensions we proposed in the spatio-temporal transformer still boost the model performance and allow us to achieve state-of-the-art in HOI detection and anticipation in videos.…”

Section: Ablation Studymentioning

confidence: 99%

Human–object interaction prediction in videos through gaze following

Mascaró

Ahn

et al. 2023

Computer Vision and Image Understanding

View full text Add to dashboard Cite

“…Zhong et al (2021) proposed a one-stage method, namely glance and gaze network, which adaptively simulated a set of action-aware points through glance and gaze steps. Tu et al (2022) presented an effective and efficient method for human-gaze-target detection and gaze following based on determination of the relations of salient objects and the human gaze from the global image context.…”

Section: Related Workmentioning

confidence: 99%

Analyzing students' attention by gaze tracking and object detection in classroom teaching

Zhang

Sun

et al. 2023

DTA

View full text Add to dashboard Cite

PurposeAttention is one of the most important factors to affect the academic performance of students. Effectively analyzing students' attention in class can promote teachers' precise teaching and students' personalized learning. To intelligently analyze the students' attention in classroom from the first-person perspective, this paper proposes a fusion model based on gaze tracking and object detection. In particular, the proposed attention analysis model does not depend on any smart equipment.Design/methodology/approachGiven a first-person view video of students' learning, the authors first estimate the gazing point by using the deep space–time neural network. Second, single shot multi-box detector and fast segmentation convolutional neural network are comparatively adopted to accurately detect the objects in the video. Third, they predict the gazing objects by combining the results of gazing point estimation and object detection. Finally, the personalized attention of students is analyzed based on the predicted gazing objects and the measurable eye movement criteria.FindingsA large number of experiments are carried out on a public database and a new dataset that is built in a real classroom. The experimental results show that the proposed model not only can accurately track the students' gazing trajectory and effectively analyze the fluctuation of attention of the individual student and all students but also provide a valuable reference to evaluate the process of learning of students.Originality/valueThe contributions of this paper can be summarized as follows. The analysis of students' attention plays an important role in improving teaching quality and student achievement. However, there is little research on how to automatically and intelligently analyze students' attention. To alleviate this problem, this paper focuses on analyzing students' attention by gaze tracking and object detection in classroom teaching, which is significant for practical application in the field of education. The authors proposed an effectively intelligent fusion model based on the deep neural network, which mainly includes the gazing point module and the object detection module, to analyze students' attention in classroom teaching instead of relying on any smart wearable device. They introduce the attention mechanism into the gazing point module to improve the performance of gazing point detection and perform some comparison experiments on the public dataset to prove that the gazing point module can achieve better performance. They associate the eye movement criteria with visual gaze to get quantifiable objective data for students' attention analysis, which can provide a valuable basis to evaluate the learning process of students, provide useful learning information of students for both parents and teachers and support the development of individualized teaching. They built a new database that contains the first-person view videos of 11 subjects in a real classroom and employ it to evaluate the effectiveness and feasibility of the proposed model.

show abstract

End-to-End Human-Gaze-Target Detection with Transformers

Cited by 46 publications

References 42 publications

Gaze Target Detection by Merging Human Attention and Activity Cues

Gaze Target Detection by Merging Human Attention and Activity Cues

Human–object interaction prediction in videos through gaze following

Analyzing students' attention by gaze tracking and object detection in classroom teaching

Contact Info

Product

Resources

About