Gaze is an important non-verbal cue involved in many facets of social interactions like communication, attentiveness or attitudes. Nevertheless, extracting gaze directions visually and remotely usually suffers large errors because of low resolution images, inaccurate eye cropping, or large eye shape variations across the population, amongst others. This paper hypothesizes that these challenges can be addressed by exploiting multimodal social cues for gaze model adaptation on top of an head-pose independent 3D gaze estimation framework. First, a robust eye cropping refinement is achieved by combining a semantic face model with eye landmark detections. Investigations on whether temporal smoothing can overcome instantaneous refinement limitations is conducted. Secondly, to study whether social interaction convention could be used as priors for adaptation, we exploited the speaking status and head pose constraints to derive soft gaze labels and infer person-specific gaze bias using robust statistics. Experimental results on gaze coding in natural interactions from two different settings demonstrate that the two steps of our gaze adaptation method contribute to reduce gaze errors by a large margin over the baseline and can be generalized to several identities in challenging scenarios.
The strong interest children show for mobile robots makes these devices potentially powerful to teach programming. Moreover, the tangibility of physical objects and the sociability of interacting with them are added benefits. A key skill that novices in programming have to acquire is the ability to mentally trace program execution. However, because of their embodied and real-time nature, robots make the mental tracing of program execution difficult.To address this difficulty, in this paper we propose an automatic program evaluation framework based on a robot simulator. We describe a real-time implementation providing feedback and gamified hints to students.In a user study, we demonstrate that our hint system increases the percentage of students writing correct programs from 50 % to 96 %, and decreases the average time to write a correct program by 30 %. However, we could not show any correlation between the use of the system and the performance of students on a questionnaire testing concept acquisition. This suggests that programming skills and concept understanding are different abilities.Overall, the clear performance gain shows the value of our approach for programming education using robots.
Recognizing eye movements is important for gaze behavior understanding like in human communication analysis (human-human or robot interactions) or for diagnosis (medical, reading impairments). In this paper, we address this task using remote RGB-D sensors to analyze people behaving in natural conditions. This is very challenging given that such sensors have a normal sampling rate of 30 Hz and provide low-resolution eye images (typically 36x60 pixels), and natural scenarios introduce many variabilities in illumination, shadows, head pose, and dynamics. Hence gaze signals one can extract in these conditions have lower precision compared to dedicated IR eye trackers, rendering previous methods less appropriate for the task. To tackle these challenges, we propose a deep learning method that directly processes the eye image video streams to classify them into fixation, saccade, and blink classes, and allows to distinguish irrelevant noise (illumination, low-resolution artifact, inaccurate eye alignment, difficult eye shapes) from true eye motion signals. Experiments on natural 4-party interactions demonstrate the benefit of our approach compared to previous methods, including deep learning models applied to gaze outputs. CCS CONCEPTS • Computing methodologies → Tracking; Neural networks; Activity recognition and understanding.
Eye gaze and facial expressions are central to face-to-face social interactions. These behavioral cues and their connections to first impressions have been widely studied in psychology and computing literature, but limited to a single situation. Utilizing ubiquitous multimodal sensors coupled with advances in computer vision and machine learning, we investigate the connections between these behavioral cues and perceived soft skills in two diverse workplace situations (job interviews and reception desk). Pearson's correlation analysis shows a moderate connection between certain facial expressions, eye gaze cues and perceived soft skills in job interviews (r ∈ [−30, 30]) and desk (r ∈ [20, 36]) situations. Results of our computational framework to infer perceived soft skills indicates a low predictive power of eye gaze, facial expressions, and their combination in both interviews (R 2 ∈ [0.02, 0.21]) and desk (R 2 ∈ [0.05, 0.15]) situations. Our work has important implications for employee training and behavioral feedback systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.