Classifying complex human motion sequences is a major research challenge in the domain of human activity recognition. Currently, most popular datasets lack a specialized set of classes pertaining to similar action sequences (in terms of spatial trajectories). To recognize such complex action sequences with high inter-class similarity, such as those in karate, multiple streams are required. To fulfill this need, we propose MS-KARD, a Multi-Stream Karate Action Recognition Dataset that uses multiple vision perspectives, as well as sensor data -accelerometer and gyroscope. It includes 1518 video clips along with their corresponding sensor data. Each video was shot at 30fps and lasts around one minute, equating to a total of 2,814,930 frames and 5,623,734 sensor data samples. The dataset has been collected for 23 classes like Jodan Zuki, Oi Zuki, etc. The data acquisition setting involves the combination of 2 orthogonal web cameras and 3 wearable inertial sensors recording both vision and inertial data respectively. The aim of this dataset is to aid research that deals with recognizing human actions that have similar spatial trajectories. The paper describes statistics of the dataset, acquisition setting, and provides baseline performance figures using popular action recognizers. We propose an ensemble-based method, KarateNet, that performs decision-level fusion on the two input modalities (vision and sensor data) to classify actions. For the first stream, the RGB frames are extracted from the videos and passed into action recognition networks like Temporal Segment Network (TSN) and Temporal Shift Module (TSM). For the second stream, the sensor data is converted into a 2-D image and fed into a Convolutional Neural Network (CNN). The results reported were obtained on performing a fusion of the 2 streams. We also report results on ablations that use fusion with various input settings. The dataset and code will be made publicly available.
Human Activity Recognition is an important task in Computer Vision that involves the utilization of spatio-temporal features of videos to classify human actions. The temporal portion of videos contains vital information needed for accurate classification. However, common Deep Learning methods simply average the temporal features, thereby giving all frames equal importance irrespective of their relevance, which negatively impacts the accuracy of the model. To combat this adverse effect, this paper proposes a novel Transformer Based Attention Consensus (TBAC) module. The TBAC module can be used in a plug-andplay manner as an alternate to the conventional consensus methods of any existing video action recognition network. The TBAC module contains four components: (i) Query Sampling Unit, (ii) Attention Extraction Unit, (iii) Softening Unit, and (iv) Attention Consensus Unit. Our experiments demonstrate that the use of the TBAC module in place of classical consensus can improve the performance of the CNN-based action recognition models, such as Channel Separated Convolutional Network (CSN), Temporal Shift Module (TSM), and Temporal Segment Network (TSN).We also propose the Decision Consensus (DC) algorithm that utilizes multiple independent but related action recognizer models in order to improve upon the performance of most of these constituent models, using a novel fusion algorithm. Results have been obtained on two benchmark human action recognition datasets, HMDB51 and HAA500. The use of the proposed TBAC module along with Decision Consensus achieves state-of-the-art performances, with 85.23% and 83.73% classification accuracies on the two databases HMDB51 and HAA500, respectively. The code will be made publicly available.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.