In this paper, we present a novel descriptor for human action recognition, called Motion of Oriented Magnitudes Patterns (MOMP), which considers the relationships between the local gradient distributions of neighboring patches coming from successive frames in video. The proposed descriptor also characterizes the information changing across different orientations, is therefore very discriminative and robust. The major advantages of MOMP are its very fast computation time and simple implementation. Subsequently, our features are combined with an effective coding scheme VLAD (Vector of locally aggregated descriptors) in the feature representation step, and a SVM (Support Vector Machine) classifier in order to better represent and classify the actions. By experimenting on several common benchmarks, we obtain the state-of-the-art results on the KTH dataset as well as the performance comparable to the literature on the UCF Sport dataset.
In this paper, we propose an integrated approach to robot vision: a key frame-based skeleton feature estimation and action recognition network (KFSENet) that incorporates action recognition with face and emotion recognition to enable social robots to engage in more personal interactions. Instead of extracting the human skeleton features from the entire video, we propose a key frame-based approach for their extraction using pose estimation models. We select the key frames using the gradient of a proposed total motion metric that is computed using dense optical flow. We use the extracted human skeleton features from the selected key frames to train a deep neural network (i.e., the double-feature double-motion network (DDNet)) for action recognition. The proposed KFSENet utilizes a simpler model to learn and differentiate between the different action classes, is computationally simpler and yields better action recognition performance when compared with existing methods. The use of key frames allows the proposed method to eliminate unnecessary and redundant information, which improves its classification accuracy and decreases its computational cost. The proposed method is tested on both publicly available standard benchmark datasets and self-collected datasets. The performance of the proposed method is compared to existing state-of-the-art methods. Our results indicate that the proposed method yields better performance compared with existing methods. Moreover, our proposed framework integrates face and emotion recognition to enable social robots to engage in more personal interaction with humans.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.