Abstract. The precision of human interaction recognition mainly relies on the discrimination of action feature descriptors. The descriptors contained global and local information usually can be applied to classify interaction actions. A novel approach is proposed by using the Gist feature model to recognize human interaction actions, which has an advantage of simple feature, easy to operate, good real-time and flexible applications. Taking advantage of the theories with Gaussian pyramid and center-surround mechanism, the gist features from three channels are extracted to represent the human interaction motion, then the classification result is obtained by using frame to frame nearest neighbor classifier and weighted fusion them. The method is tested on UT-Interaction dataset. The experiments show that the method obtained stable performance with simple implementation.
Keywords: Interaction recognition · Gist feature · Nearest neighbor classifier · UT-interaction dataset
IntroductionHuman interaction recognition and understanding have attracted growing interests in the computer vision community which can be used in intelligent surveillance, video analysis, human-computer interface, etc. Many approaches have been proposed to deal with interaction recognition [1-4]. However, due to the large intra-variations, viewpoint changes [5], clutter and occlusion and the other fundamental factors, which cause the interaction recognition is still a challenging research topic. Many scholars deal with interaction recognition as a general action. This kind of method usually represents the interaction as an integral descriptor including all the people involved in the interaction. Then a classier is utilized to classify interactions Yuan et al. [6] proposed spatio-temporal context to describe local spatio-temporal features and the spatio-temporal relationships between them. Then a spatio-temporal context kernel (STCK) was introduced to recognize human interactions. The feature extraction of this method is simple, however the accuracy of recognition is not good enough. Burghouts et al. [7] improve the performance of this method by exploiting a novel spatio-temporal layout description of interaction, which can improve the discriminative ability of inter-class. Peng et al [8] combined four different features including DT shape, HOG, HOF and MBH to extract low-level feature of multi-scale