Action recognition based on a mixture of RGB and depth based skeleton

Das, Srijan; Koperski, Michal; Brémond, François; Francesca, Gianpiero

doi:10.1109/avss.2017.8078548

Cited by 19 publications

(13 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…So, in these datasets spatio-temporal grids have contributed the most in achieving a competitive global recognition rate as compared to the state-of-the-art (results are in table 2 and 3). The performance of [8] similar to our performance on CAD-60, is not consistent on other datasets. Method [22] though outperforms our framework on CAD-120, is time expensive and the brute force enumeration over all settings of the latent variables cause extra computational cost.…”

Section: Comparison With State-of-the-artcontrasting

confidence: 49%

“…But most of these techniques fail to discriminate the daily living activities because of its challenges. Authors in [6,8] extract spatial features from different parts of the body by first cropping out the parts from RGB frames using skeleton joint information. But, cropping different parts of the body and resizing them into 224×224 so as to feed the network is not a robust technique.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Spatio-Temporal Grids for Daily Living Action Recognition

Das

Sakhalkar

Koperski

et al. 2018

Proceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing

Self Cite

View full text Add to dashboard Cite

This paper address the recognition of short-term daily living actions from RGB-D videos. The existing approaches ignore spatio-temporal contextual relationships in the action videos. So, we propose to explore the spatial layout to better model the appearance. In order to encode temporal information, we divide the action sequence into temporal grids. We address the challenge of subject invariance by applying clustering on the appearance features and velocity features to partition the temporal grids. We validate our approach on four public datasets. The results show that our method is competitive with the state-of-the-art.

show abstract

Section: Comparison With State-of-the-artcontrasting

confidence: 49%

Section: Related Workmentioning

confidence: 99%

Spatio-Temporal Grids for Daily Living Action Recognition

Das

Sakhalkar

Koperski

et al. 2018

Proceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…This can be done by a technique of regression on the confusion matrix in the training phase. [23] 70.26 VA-LSTM [22] 79.4 CMN [24] 80.8 STA-hands [2] 82.5 Glimpse Clouds [3] 86.6 Proposed Method 87.09 Proposed Method (with I3D) 92.20 Table 4: Recognition Accuracy comparison for CAD-60 , CAD-120, MSRDai-lyActivity3D (Performance of baseline is taken from [8,12,7] respectively) and NTU-RGB+D dataset.…”

Section: Resultsmentioning

confidence: 99%

“…Early models extract CNN features from video frames and aggregates them with pooling for classifying by SVM. The authors in [5,8] use different body part patches to extract features from a convolutional network in order to recognize actions. The requirement to introduce spatio-temporal relationship in videos motivated the authors in [4] to use 3D convolutions.…”

Section: Related Work On Action Recognitionmentioning

confidence: 99%

A New Hybrid Architecture for Human Activity Recognition from RGB-D Videos

Das

Thonnat

Sakhalkar

et al. 2018

MultiMedia Modeling

Self Cite

View full text Add to dashboard Cite

Activity Recognition from RGB-D videos is still an open problem due to the presence of large varieties of actions. In this work, we propose a new architecture by mixing a high level handcrafted strategy and machine learning techniques. We propose a novel two level fusion strategy to combine features from different cues to address the problem of large variety of actions. As similar actions are common in daily living activities, we also propose a mechanism for similar action discrimination. We validate our approach on four public datasets, CAD-60, CAD-120, MSRDailyActivity3D, and NTU-RGB+D improving the state-of-the-art results on them.

show abstract

“…In our work we utilize OpenPose [53,52], to extract skeletal joint coordinates, as in [56,57]. This library returns 2D skeletal coordinates (x i , y i , c i ), for i =…”

Section: Skeleton Extraction Modulementioning

confidence: 99%

A real-time human-robot interaction framework with robust background invariant hand gesture detection

Mazhar

Navarro

Ramdani

et al. 2019

Robotics and Computer-Integrated Manufacturing

104

View full text Add to dashboard Cite

In the light of factories of the future, to ensure productive and safe interaction between robot and human coworkers, it is imperative that the robot extracts the essential information of the coworker. We address this by designing a reliable framework for real-time safe human-robot collaboration, using static hand gestures and 3D skeleton extraction. OpenPose library is integrated with Microsoft Kinect V2, to obtain a 3D estimation of the human skeleton. With the help of 10 volunteers, we recorded an image dataset of alphanumeric static hand gestures, taken from the American Sign Language. We named our dataset OpenSign and released it to the community for benchmarking. Inception V3 convolutional

show abstract

Action recognition based on a mixture of RGB and depth based skeleton

Cited by 19 publications

References 33 publications

Spatio-Temporal Grids for Daily Living Action Recognition

Spatio-Temporal Grids for Daily Living Action Recognition

A New Hybrid Architecture for Human Activity Recognition from RGB-D Videos

A real-time human-robot interaction framework with robust background invariant hand gesture detection

Contact Info

Product

Resources

About