Recognizing complex human actions is very challenging, since training a robust learning model requires a large amount of labeled data, which is difficult to acquire. Considering that each complex action is composed of a sequence of simple actions which can be easily obtained from existing data sets, this paper presents a simple to complex action transfer learning model (SCA-TLM) for complex human action recognition. SCA-TLM improves the performance of complex action recognition by leveraging the abundant labeled simple actions. In particular, it optimizes the weight parameters, enabling the complex actions to be learned to be reconstructed by simple actions. The optimal reconstruct coefficients are acquired by minimizing the objective function, and the target weight parameters are then represented as a combination of source weight parameters. The main advantage of the proposed SCA-TLM compared with existing approaches is that we exploit simple actions to recognize complex actions instead of only using complex actions as training samples. To validate the proposed SCA-TLM, we conduct extensive experiments on two well-known complex action data sets: 1) Olympic Sports data set and 2) UCF50 data set. The results show the effectiveness of the proposed SCA-TLM for complex action recognition.
Slow feature analysis (SFA) extracts slowly varying signals from input data and has been used to model complex cells in the primary visual cortex (V1). It transmits information to both ventral and dorsal pathways to process appearance and motion information, respectively. However, SFA only uses slowly varying features for local feature extraction, because they represent appearance information more effectively than motion information. To better utilize temporal information, we propose temporal variance analysis (TVA) as a generalization of SFA. TVA learns a linear transformation matrix that projects multidimensional temporal data to temporal components with temporal variance. Inspired by the function of V1, we learn receptive fields by TVA and apply convolution and pooling to extract local features. Embedded in the improved dense trajectory framework, TVA for action recognition is proposed to: 1) extract appearance and motion features from gray using slow and fast filters, respectively; 2) extract additional motion features using slow filters from horizontal and vertical optical flows; and 3) separately encode extracted local features with different temporal variances and concatenate all the encoded features as final features. We evaluate the proposed TVA features on several challenging data sets and show that both slow and fast features are useful in the low-level feature extraction. Experimental results show that the proposed TVA features outperform the conventional histogram-based features, and excellent results can be achieved by combining all TVA features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.