2020
DOI: 10.48550/arxiv.2006.07743
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

3DFCNN: Real-Time Action Recognition using 3D Deep Neural Networks with Raw Depth Information

Abstract: Human actions recognition is a fundamental task in artificial vision, that has earned a great importance in recent years due to its multiple applications in different areas. In this context, this paper describes an approach for real-time human action recognition from raw depth image-sequences, provided by an RGB-D camera. The proposal is based on a 3D fully convolutional neural network, named 3DFCNN, which automatically encodes spatio-temporal patterns from depth sequences without pre-processing. Furthermore, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 71 publications
0
4
0
Order By: Relevance
“…Our SequentialPoint-Net achieves state-of-the-art performance among all methods on the cross-view test setting and results in similar recognition accuracy as PSTNet on the cross-subject test setting. [56] depth 84.6 87.3 3DFCNN(2020) [57] depth 78.1 80.4 Stateful ConvLSTM(2020) [58] depth 80.4 79.9…”
Section: Comparison With the State-of-the-art Methodsmentioning
confidence: 99%
“…Our SequentialPoint-Net achieves state-of-the-art performance among all methods on the cross-view test setting and results in similar recognition accuracy as PSTNet on the cross-subject test setting. [56] depth 84.6 87.3 3DFCNN(2020) [57] depth 78.1 80.4 Stateful ConvLSTM(2020) [58] depth 80.4 79.9…”
Section: Comparison With the State-of-the-art Methodsmentioning
confidence: 99%
“…In [270], multi-view dynamic images were extracted through multi-view projection in depth videos for action characterization. An end-to-end 3D Fully CNN (3D-FCNN) was introduced in [271], which can automatically encode spatio-temporal information without pre-processing. In their subsequent work [272], a variant of LSTM, named stateful ConvLSTM, was further introduced to address the problem of memory limitation during video processing, which can be used to perform HAR from long and complex videos effectively.…”
Section: Deep Learning Methodsmentioning
confidence: 99%
“…In [57], multi-view dynamic images were extracted through multiview projections from depth videos for action recognition. In order to effectively capture spatio-temporal information in depth videos, Sanchez et al [269] proposed a 3D fully CNN architecture for HAR. In their subsequent work [270], a variant of LSTM unit was introduced to address the problem of memory limitation during video processing, which can be used to perform HAR from long and complex videos.…”
Section: Depth Modalitymentioning
confidence: 99%