It is worth mentioning that in the video sequence modeling, the best recognition architecture is transformer. The current popular transformer based video classification methods focus on the importance of current features in time sequence. The degree of characterization of simultaneous order is insufficient, and simple data augmentation has unstable classification effect. In this paper we proposed a method of non-parametric attention combined with self-supervised feature construction to further improve video classification. In this method, the non-parametric attention mechanism is constructed in the simultaneous order feature to fit the multi-local extreme value distribution. At the same time, in the process of model learning, the input video is randomly masked in temporal domain and spatial domain, and selfsupervised information is added to effectively learn the details and classification information of video content. Experiments using kinetics400, kinetics600 and something V2 datasets show that the algorithm in this paper has better improvement in accuracy than the current optimal method.
In order to fully mine the performance improvement of spatio-temporal features in video action classification, we propose a multi-visual information fusion time sequence prediction network (MI-TPN) which based on the feature aggregation model ActionVLAD. The method includes three parts: multi-visual information fusion, time sequence feature modeling and spatiotemporal feature aggregation. In the multi-visual information fusion, the RGB features and optical flow features are combined, the visual context and action description details are fully considered. In time sequence feature modeling, the temporal relationship is modeled by LSTM to obtain the importance measurement between temporal description features. Finally, in feature aggregation, time step feature and spatiotemporal center attention mechanism are used to aggregate features and projected them into a common feature space. This method obtains good results on three commonly used comparative datasets UCF101, HMDB51 and Something.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.