SUMMARYThe authors developed a new vision-based interface that detects gestures in real time to enable interaction between the user and computer. This vision-based interface consists of two elements: the "Motion Processor TM " and the "extraction of the region of interest (ROI)." The Motion Processor is a new image acquisition device. By illuminating the target object with near-infrared light and using image sensors to collect the reflected light, the Motion Processor can eliminate the background and collect shape, motion, and depth information of only the target object. In addition, the authors proposed a method of using the collected image depth information to quickly and stably extract the ROI and showed, according to evaluation experiments on a PC, that the region can be detected with high precision within 0.06 second. Real-time sensing of the shape and motion of a specific object from an image can be implemented by using the Motion Processor and this ROI extraction method, enabling this interface to function as a computer eye that can easily recognize a target object. The authors applied this real-time vision-based interface to the field of robotics and used it together with speech recognition on a single PC to create a prototype of a pet robot system that can respond to the user's gestures.