2023
DOI: 10.36227/techrxiv.22146914.v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Neural Networks in Video Human Action Recognition: A review

Abstract: <p>Currently, spatial-temporal behavior recognition is one of the most foundational tasks of computer vision. The 2D neural networks of deep learning are built for recognizing pixel-level information such as images with RGB, RGB-D, or optical flow formats, with the current increasingly wide usage of surveillance video and more tasks related to human action recognition. There are increasing tasks requiring temporal information for frames dependency analysis. The researchers have widely studied video-based… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 99 publications
0
2
0
Order By: Relevance
“…HAR is based on computer vision with RGB, skeletal, and depth input representation. Wang et al [15] surveyed on HAR based on input data can be the skeleton, RGB, RGB + D, optical flow, etc. In the study, the authors only present and analyze the ST-GCN (Spatial Temporal Graph Convolutional Networks) [16] and 3D CNNs (3D Convolutional Neural Networks) hybrid with some architecture [17,18].…”
Section: Related Workmentioning
confidence: 99%
“…HAR is based on computer vision with RGB, skeletal, and depth input representation. Wang et al [15] surveyed on HAR based on input data can be the skeleton, RGB, RGB + D, optical flow, etc. In the study, the authors only present and analyze the ST-GCN (Spatial Temporal Graph Convolutional Networks) [16] and 3D CNNs (3D Convolutional Neural Networks) hybrid with some architecture [17,18].…”
Section: Related Workmentioning
confidence: 99%
“…Inspired by the success of deep learning [14,21], the encoderdecoder framework with attention mechanisms [2] have been dominated in MWP [18][19][20], which bring the state-of-the-art to a new level. The key idea is to use an encoder to learn representations of problem text and employ a decoder to generate the corresponding solution expression and answer.…”
Section: Introductionmentioning
confidence: 99%