2022
DOI: 10.1007/s11042-022-12091-z
|View full text |Cite
|
Sign up to set email alerts
|

3DFCNN: real-time action recognition using 3D deep neural networks with raw depth information

Abstract: This work describes an end-to-end approach for real-time human action recognition from raw depth image-sequences. The proposal is based on a 3D fully convolutional neural network, named 3DFCNN, which automatically encodes spatio-temporal patterns from raw depth sequences. The described 3D-CNN allows actions classification from the spatial and temporal encoded information of depth sequences. The use of depth data ensures that action recognition is carried out protecting people’s privacy, since their identities … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 41 publications
(20 citation statements)
references
References 94 publications
0
6
0
Order By: Relevance
“…Action detection, often the main task of video understanding, typically estimates a 2D bounding box (region in the image) and label for each action event on video frames. Convolution neural networks (CNNs) have been successfully applied to vehicle motion detection, as in [26][27][28][29][30]. With respect to autonomous driving, works using ego-vehicle cameras such as [31][32][33][34] focus on detecting vehicle motion.…”
Section: Video Understandingmentioning
confidence: 99%
“…Action detection, often the main task of video understanding, typically estimates a 2D bounding box (region in the image) and label for each action event on video frames. Convolution neural networks (CNNs) have been successfully applied to vehicle motion detection, as in [26][27][28][29][30]. With respect to autonomous driving, works using ego-vehicle cameras such as [31][32][33][34] focus on detecting vehicle motion.…”
Section: Video Understandingmentioning
confidence: 99%
“…Action Recognition from 3D Data. 3D action recognition models can be broadly categorized in to depthbased, [27][28][29][30][31] skeleton-based, [32][33][34][35][36][37][38] and point-cloud-based. [39][40][41][42] Depth representations provide reliable 3D geometric cues which can be robust to different viewpoints.…”
Section: Related Workmentioning
confidence: 99%
“…As in the case of RGB-based HAR works, the first depth-based studies employed methods based on handcrafted descriptors [28,41,65,67], but eventually works using deep learning became the main approach [47]. Besides, DNN-based approaches have been proved [69] to be more robust and suitable for challenging large datasets than handcrafted features-based methods, but with much higher computational costs, which makes it difficult to use them in real-time applications.…”
Section: Introductionmentioning
confidence: 99%
“…The use of DNNs for HAR requires encoding the spatio-temporal information [7]. To do that, there are several approaches that create ad-hoc representations such as depth motion maps [73] or dynamic images [72,74,76,79,81], whereas other works use specific DNNs, such as the 3D convolutional neural networks (3DCNNs) [36,47,54,80] or the recurrent neural networks (RNN) [11,31,49,50], that processes video-sequences. A particular RNN which solves the exploding or vanishing gradient problem is the long short-term memory (LSTM) [17], which can successfully learn patterns in long sequences like videos by stacking several layers.…”
Section: Introductionmentioning
confidence: 99%