2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 2017
DOI: 10.1109/avss.2017.8078548
|View full text |Cite
|
Sign up to set email alerts
|

Action recognition based on a mixture of RGB and depth based skeleton

Abstract: In this paper, we study how different skeleton extraction methods affect the performance of action recognition. As shown in previous work skeleton information can be exploited for action recognition. Nevertheless, skeleton detection problem is already hard and very often it is difficult to obtain reliable skeleton information from videos. In this paper, we compare two skeleton detection methods: the depthmap based method used with Kinect camera and RGB based method that uses Deep Convolutional Neural Networks.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 19 publications
(13 citation statements)
references
References 33 publications
0
12
1
Order By: Relevance
“…So, in these datasets spatio-temporal grids have contributed the most in achieving a competitive global recognition rate as compared to the state-of-the-art (results are in table 2 and 3). The performance of [8] similar to our performance on CAD-60, is not consistent on other datasets. Method [22] though outperforms our framework on CAD-120, is time expensive and the brute force enumeration over all settings of the latent variables cause extra computational cost.…”
Section: Comparison With State-of-the-artcontrasting
confidence: 49%
See 1 more Smart Citation
“…So, in these datasets spatio-temporal grids have contributed the most in achieving a competitive global recognition rate as compared to the state-of-the-art (results are in table 2 and 3). The performance of [8] similar to our performance on CAD-60, is not consistent on other datasets. Method [22] though outperforms our framework on CAD-120, is time expensive and the brute force enumeration over all settings of the latent variables cause extra computational cost.…”
Section: Comparison With State-of-the-artcontrasting
confidence: 49%
“…But most of these techniques fail to discriminate the daily living activities because of its challenges. Authors in [6,8] extract spatial features from different parts of the body by first cropping out the parts from RGB frames using skeleton joint information. But, cropping different parts of the body and resizing them into 224×224 so as to feed the network is not a robust technique.…”
Section: Related Workmentioning
confidence: 99%
“…This can be done by a technique of regression on the confusion matrix in the training phase. [23] 70.26 VA-LSTM [22] 79.4 CMN [24] 80.8 STA-hands [2] 82.5 Glimpse Clouds [3] 86.6 Proposed Method 87.09 Proposed Method (with I3D) 92.20 Table 4: Recognition Accuracy comparison for CAD-60 , CAD-120, MSRDai-lyActivity3D (Performance of baseline is taken from [8,12,7] respectively) and NTU-RGB+D dataset.…”
Section: Resultsmentioning
confidence: 99%
“…Early models extract CNN features from video frames and aggregates them with pooling for classifying by SVM. The authors in [5,8] use different body part patches to extract features from a convolutional network in order to recognize actions. The requirement to introduce spatio-temporal relationship in videos motivated the authors in [4] to use 3D convolutions.…”
Section: Related Work On Action Recognitionmentioning
confidence: 99%
“…In our work we utilize OpenPose [53,52], to extract skeletal joint coordinates, as in [56,57]. This library returns 2D skeletal coordinates (x i , y i , c i ), for i =…”
Section: Skeleton Extraction Modulementioning
confidence: 99%