2020
DOI: 10.1145/3350840
|View full text |Cite
|
Sign up to set email alerts
|

Action Recognition Using Form and Motion Modalities

Abstract: Action recognition has attracted increasing interest in computer vision due to its potential applications in many vision systems. One of the main challenges in action recognition is to extract powerful features from videos. Most existing approaches exploit either hand-crafted techniques or learning-based methods to extract features from videos. However, these methods mainly focus on extracting the dynamic motion features, which ignore the static form features. Therefore, these methods cannot fully capture the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(5 citation statements)
references
References 44 publications
0
5
0
Order By: Relevance
“…The classifier demonstrates above-human performance and surpasses the sparse spatio-temporal feature approach detailed in Burgos-Artizzu et al 6 on the CRIM13 dataset. It also performs better than prior methods based on temporal features 25 , independent component analysis 26 , hierarchical sparse coding 27 , integrated sparse and dense trajectory features 28 .…”
Section: Resultsmentioning
confidence: 96%
“…The classifier demonstrates above-human performance and surpasses the sparse spatio-temporal feature approach detailed in Burgos-Artizzu et al 6 on the CRIM13 dataset. It also performs better than prior methods based on temporal features 25 , independent component analysis 26 , hierarchical sparse coding 27 , integrated sparse and dense trajectory features 28 .…”
Section: Resultsmentioning
confidence: 96%
“…This indicates that the multi-head attention mechanism is useful for recognizing the action, and the proposed MAT-EffNet is a competitive network for action recognition. [12] 63.3 -I3D-RGB [12] 72.1 90.3% ARTNet [52] 70.7 89.3% MoViNet-A5 [66] 71.7 -VidTr-L [67] 70.2 89% MAT-EffNet 72.6 90.8% LRCN [38] RGB + optical flow 82.9 -C3D [26] RGB only + 3D CNNs 85.2 -IDTs [32] RGB only + 3D CNNs 85.9 57.2% Two-stream [1] RGB + optical flow 88.0 59.4% FSTCN [39] RGB + optical flow 88.1 59.1% P3D-199 [65] RGB + 3D CNNs 89.2 62.9% TDD [34] RGB + optical flow 90.3 63.2% STS-network [17] RGB + optical flow + others 90.1 62.4% R-M3D [11] RGB only + 3D CNNs 93.2 65.4% STDAN + RGB difference [58] RGB + optical flow + others 91.0 60.4% TSN Corrnet [55] RGB + optical flow 94.4 70.6% MSM-ResNets [56] RGB + optical flow + others 93.5 66.7% R-STAN-50 [68] RGB + optical flow 91.…”
Section: Exploration Of Mat-effnet On the Kinetics-400 Datasetmentioning
confidence: 99%
“…Y. Li et al [43] proposed a novel spatio-temporal based deep residual neural network via categorized attentions (STDRN-HA) for human video event detection and classification. In [44], Q. Meng et al proposed the Support Vector Machine SVM classification based novel features extraction method for event classification and detection, however, single SVM is the reason for the system's low accuracy. In [45], S. Sun et al developed a guided optical flow feature extraction approach via Convolutional Neural Network (CNN) for human event detection.…”
Section: Conflicts Of Interestmentioning
confidence: 99%