Abstract. The automatic processing and estimation of view direction and head pose in interactive scenarios is an actively investigated research topic in the development of advanced human-computer or human-robot interfaces. Still, current state of the art approaches often make rigid assumptions concerning the scene illumination and viewing distance in order to achieve stable results. In addition, there is a lack of rigorous evaluation criteria to compare different computational vision approaches and to judge their flexibility. In this work, we make a step towards the employment of robust computational vision mechanisms to estimate the actor's head pose and thus the direction of his focus of attention. We propose a domain specific mechanism based on learning to estimate stereo correspondences of image pairs. Furthermore, in order to facilitate the evaluation of computational vision results, we present a data generation framework capable of image synthesis under controlled pose conditions using an arbitrary camera setup with a free number of cameras. We show some computational results of our proposed mechanism as well as an evaluation based on the available reference data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.