We present a novel technique implementing barehanded interaction with virtual 3D content by employing a time-of-flight camera. The system improves on existing 3D multi-touch systems by working regardless of lighting conditions and supplying a working volume large enough for multiple users. Previous systems were limited either by environmental requirements, working volume, or computational resources necessary for realtime operation. By employing a time-of-flight camera, the system is capable of reliably recognizing gestures at the finger level in real-time at more than 50 fps with commodity computer hardware using our newly developed precision hand and finger-tracking algorithm. Building on this algorithm, the system performs gesture recognition with simple constraint modeling over statistical aggregations of the hand appearances in a working volume of more than 8 cubic meters. Two iterations of user tests were performed on a prototype system, demonstrating the feasibility and usability of the approach as well as providing first insights regarding the acceptance of true barehanded touch-based 3D interaction.
INTRODUCTIONMulti-touch interaction techniques have become widely available recently, being used for instance in table top systems such as Microsoft's Surface [30] projection-based systems such as CityWall [24], desktop systems such as HP's TouchSmart series as well as in several mobile devices, in particular smartphones such as the Google Nexus and, of course, the iPhone. The introduction of multi-touch interaction techniques has probably been the most important change to user input since the introduction of the mouse.To date, multi-touch interaction typically is surface based. Selection of objects is required before actually manipulating them by touching the corresponding surface with the hands.Freehand multi-touch has been explored within various approaches, e.g. Oblong's g-speak 1 , after initially having been introduced in the popular Hollywood movie "Minority Report". They usually depended heavily on hand-worn gloves, markers, wrists, or other input devices, and typically did not achieve the intuitiveness, simplicity and efficiency of surface (2D) based multi-touch techniques 2 . In contrast to those approaches, the goal of our approach was to use barehanded interaction as a replacement for surface based interaction.Only a vision-based approach will allow for freehand and barehanded 3D multi-touch interaction. The system must also provide sufficient solutions for the following four steps: detection of hand position without prior knowledge of existence; for each appearance determine pose from image cues; track appearances over time; recognize gestures based on their trajectory and pose information. Various approaches exist to solve all four problems, each featuring different advantages and disadvantages.Barehanded 3D interaction has recently been presented by Mygestyk 3 and provides the basis for Microsoft's Kinect interface for Xbox 360 4 . However, these approaches are limited either to a sing...