Abstract:Abstract. We address the recognition of people's visual focus of attention (VFOA), the discrete version of gaze that indicates who is looking at whom or what. As a good indicator of addressee-hood (who speaks to whom, and in particular is a person speaking to the robot) and of people's interest, VFOA is an important cue for supporting dialog modelling in Human-Robot interactions involving multiple persons. In absence of high definition images, we rely on people's head pose to recognize the VFOA. Rather than as… Show more
“…Instead, it more concerns whether the audience can perceive that they are being addressed. In many applications, instead of localizing gaze exactly, head/face orientation was used as the effective approximation for subjects focus target [18,19]. The experiment in [20] also proved that, head orientation was the reliable indication of the visual focus of attention in 89% of the time.…”
Abstract. Public speaking is a non-trivial task since it is affected by how nonverbal behaviors are expressed. Practicing to deliver the appropriate expressions is difficult while they are mostly given subconsciously. This paper presents our empirical study on the nonverbal behaviors of presenters. Such information was used as the ground truth to develop an intelligent tutoring system. The system can capture bodily characteristics of presenters via a depth camera, interpret this information in order to assess the quality of the presentation, and then give feedbacks to users. Feedbacks are delivered immediately through a virtual conference room, in which the reactions of the simulated avatars can be controlled based on the performance of presenters.
“…Instead, it more concerns whether the audience can perceive that they are being addressed. In many applications, instead of localizing gaze exactly, head/face orientation was used as the effective approximation for subjects focus target [18,19]. The experiment in [20] also proved that, head orientation was the reliable indication of the visual focus of attention in 89% of the time.…”
Abstract. Public speaking is a non-trivial task since it is affected by how nonverbal behaviors are expressed. Practicing to deliver the appropriate expressions is difficult while they are mostly given subconsciously. This paper presents our empirical study on the nonverbal behaviors of presenters. Such information was used as the ground truth to develop an intelligent tutoring system. The system can capture bodily characteristics of presenters via a depth camera, interpret this information in order to assess the quality of the presentation, and then give feedbacks to users. Feedbacks are delivered immediately through a virtual conference room, in which the reactions of the simulated avatars can be controlled based on the performance of presenters.
“…One of them is the understanding the visual focus of attention of humans while interacting with robots. This is addressed in this volume [64].…”
Section: Socially Assistive Roboticsmentioning
confidence: 97%
“…Especially, forming joint attention through modeling the gaze of a human can be very useful in human-robot collaboration scenarios or when a human teacher teaches tasks or concepts involving the objects in the environment [70,64]. In [70], object saliency is used in conjunction with head pose estimates to allow a humanoid robot to determine the visual focus of attention of the interacting human, while in [64] a fixed mapping between head pose directions and gaze target directions was not assumed, and models are investigated that perform a dynamic (temporal) mapping implicitly accounting for varying body/shoulder orientations of a person over time, as well as unsupervised adaptation.…”
Abstract. Human behavior is complex, but structured along individual and social lines. Robotic systems interacting with people in uncontrolled environments need capabilities to correctly interpret, predict and respond to human behaviors. This paper discusses the scientific, technological and application challenges that arise from the mutual interaction of robotics and computational human behavior understanding. We supply a short survey of the area to provide a contextual framework and describe the most recent research in this area.
“…In [14] the SVM-based approach is improved upon with the use of Latent-Dynamic Conditional Random Fields (LDCRFs). Methods of deducing visual focus of attention (VFOA) [15], [16], [17] could also be used to infer gaze aversion. Many VFOA methods rely on head orientation estimation to distinguish the focus of attention in multi-party meeting scenarios.…”
Abstract-The aversion of gaze during dyadic conversations is a social signal that contains information relevant to the detection of interest, turn-taking cues, and conversational engagement. The understanding and modeling of such behavior has implications for the design of embodied conversational agents, as well as computational approaches to conversational analysis. Recent approaches to extracting gaze directions from monocular camera footage have achieved accurate results. We investigate ways of processing the extracted gaze signals from videos to perform gaze aversion detection. We present novel approaches that are based on unsupervised classification using spectral clustering as well as optimization methods. Three approaches that vary in their input parameters and their complexity are proposed and evaluated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.