We propose a real-time hand gesture interface based on combining a stereo pair of biologically inspired event-based dynamic vision sensor (DVS) silicon retinas with neuromorphic event-driven postprocessing. Compared with conventional vision or 3-D sensors, the use of DVSs, which output asynchronous and sparse events in response to motion, eliminates the need to extract movements from sequences of video frames, and allows significantly faster and more energy-efficient processing. In addition, the rate of input events depends on the observed movements, and thus provides an additional cue for solving the gesture spotting problem, i.e., finding the onsets and offsets of gestures. We propose a postprocessing framework based on spiking neural networks that can process the events received from the DVSs in real time, and provides an architecture for future implementation in neuromorphic hardware devices. The motion trajectories of moving hands are detected by spatiotemporally correlating the stereoscopically verged asynchronous events from the DVSs by using leaky integrate-and-fire (LIF) neurons. Adaptive thresholds of the LIF neurons achieve the segmentation of trajectories, which are then translated into discrete and finite feature vectors. The feature vectors are classified with hidden Markov models, using a separate Gaussian mixture model for spotting irrelevant transition gestures. The disparity information from stereovision is used to adapt LIF neuron parameters to achieve recognition invariant of the distance of the user to the sensor, and also helps to filter out movements in the background of the user. Exploiting the high dynamic range of DVSs, furthermore, allows gesture recognition over a 60-dB range of scene illuminance. The system achieves recognition rates well over 90% under a variety of variable conditions with static and dynamic backgrounds with naïve users.
This demonstration shows a natural gesture interface for console entertainment devices using as input a stereo pair of dynamic vision sensors. The event-based processing of the sparse sensor output allows fluid interaction at a laptop processor load of less than 3%.Abstract-This paper describes a novel gesture interface based on a stereo pair of event-based vision sensors and neuromorphic event processing techniques. The motion trajectory of a moving hand is detected every 3 ms by spatiotemporally correlating the output events of the DVSs by using leaky integrate-and-fire (LIF) neurons after the stereo vergence fusion. The trajectory of each gesture is automatically spotted by setting the threshold of LIF neurons, and, subsequently, sixteen feature vectors are extracted from each spotted gesture trajectory. The thresholds of LIF neurons are adaptively adjusted based on the disparity obtained from the stereovision to achieve distance invariant performance of gesture spotting. Gesture patterns were classified by using hidden Markov model (HMM)-based gesture models. The implemented system was tested with 6 subjects (3 untrained subjects and 3 trained subjects) producing continuous hand gestures (22 trials of 9 successive gestures for each subject). Achieved recognition rates ranged from 91.9 % to 99.5% depending on subject.V.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations –citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.