In human-computer interaction applications, gesture recognition has the potential to provide a natural way of communication between humans and machines. The technology is becoming mature enough to be widely available to the public and real-world computer vision applications start to emerge. A typical example of this trend is the gaming industry and the launch of Microsoft's new camera: the Kinect. Other domains, where gesture recognition is needed, include but are not limited to: sign language recognition, virtual reality environments and smart homes. A key challenge for such real-world applications is that they need to operate in complex scenes with cluttered backgrounds, various moving objects and possibly challenging illumination conditions. In this paper we propose a method that accommodates such challenging conditions by detecting the hands using scene depth information from the Kinect. On top of our detector we employ a dynamic programming method for recognizing gestures, namely Dynamic Time Warping (DTW). Our method is translation and scale invariant which is a desirable property for many HCI systems. We have tested the performance of our approach on a digits recognition system. All experimental datasets include hand signed digits gestures but our framework can be generalized to recognize a wider range of gestures.
This paper presents a new, publicly available eye tracking dataset 1 , aimed to be used as a benchmark for Point of Gaze (PoG) detection algorithms. The dataset consists of a set of videos recording the eye motion of human test subjects as they were looking at, or following, a set of predefined points of interest on a computer visual display unit. The eye motion was recorded using a Mobile Eye, head mounted, infrared monocular camera. The ground truth of the point of gaze and head location and direction in the three dimensional space are provided together with the data. The ground truth regarding the point of gaze at is known in advance since the subjects are always looking at predefined targets, whereas, the head position in 3D is captured using a Vicon Motion Tracking System.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.