2023
DOI: 10.21203/rs.3.rs-2600609/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer

Abstract: Recognizing human actions in video sequences, known as Human Action Recognition (HAR), is a challenging task in pattern recognition. While Convolutional Neural Networks (ConvNets) have shown remarkable success in image recognition, they are not always directly applicable to HAR, as temporal features are critical for accurate classification. In this paper, we propose a novel dynamic PSO-ConvNet model for learning actions in videos, building on our recent work in image recognition. Our approach leverages a frame… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 42 publications
0
2
0
Order By: Relevance
“…However, our method combines CNN with transform, endowing it with higher generalization ability and robustness. [43] Two stream CNN 86.10% Kim et al [44] Two stream CNN 87.50% Proposed Method CNN + transform 87.50% On the UCF 50 dataset, our method demonstrates excellent performance with an accuracy of 83.41% as shown in table 7. The LC + Multiview pooling method proposed by Liu et al [45] achieves an accuracy of 78.60% on the UCF 50 dataset, significantly lower than our method.…”
Section: Number Of Frames As Inputmentioning
confidence: 90%
See 1 more Smart Citation
“…However, our method combines CNN with transform, endowing it with higher generalization ability and robustness. [43] Two stream CNN 86.10% Kim et al [44] Two stream CNN 87.50% Proposed Method CNN + transform 87.50% On the UCF 50 dataset, our method demonstrates excellent performance with an accuracy of 83.41% as shown in table 7. The LC + Multiview pooling method proposed by Liu et al [45] achieves an accuracy of 78.60% on the UCF 50 dataset, significantly lower than our method.…”
Section: Number Of Frames As Inputmentioning
confidence: 90%
“…[42] achieves an accuracy of 54.96% on the UCF 101 dataset, significantly lower than our method. This finding further confirms the higher classification accuracy and robustness of our approach in handling complex action recognition tasks.The two-stream CNN method suggested by Nguyen et al [43] attains an accuracy of 86.10% on the UCF 101 dataset. While this result is already relatively high, it is slightly less accurate compared to our method.…”
Section: Number Of Frames As Inputmentioning
confidence: 99%