2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00022
|View full text |Cite
|
Sign up to set email alerts
|

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
632
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 844 publications
(634 citation statements)
references
References 22 publications
1
632
1
Order By: Relevance
“…Bone has proven to be another form of spatial information as important as joints and has been used in many recent methods [ 32 , 37 , 38 , 41 , 42 , 43 , 44 ]. The original skeleton data only contain the 3D coordinates of all joints in the skeleton, and the bone stream data are obtained by vector calculation of the original joint stream data.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Bone has proven to be another form of spatial information as important as joints and has been used in many recent methods [ 32 , 37 , 38 , 41 , 42 , 43 , 44 ]. The original skeleton data only contain the 3D coordinates of all joints in the skeleton, and the bone stream data are obtained by vector calculation of the original joint stream data.…”
Section: Methodsmentioning
confidence: 99%
“…All of our experiments were implemented on the PyTorch deep learning framework, and our model was trained on 4 GTX-1080Ti GPUs. In the data processing, we were consistent with [ 32 , 37 , 38 , 41 , 42 ] to ensure the fairness of model comparison. We explored two different training methods: one was the end-to-end training method, and the other was the two-stage training method.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, various types of methods have been proposed that have obtained good results for this problem. The best-performing solutions for methods based on 3D skeleton joints [ 30 , 31 , 32 , 33 ] combining the advantages of two important types of networks: Temporal Convolutional Neural Network [ 34 ] used to capture temporal dependencies and Graph Convolutional Neural Network [ 35 ] used to model spatial dependencies. In contrast, the solution we propose, starting from the analyzed models and the observations that we presented in the previous works [ 36 , 37 ], is based on a multi-stage architecture based on linear layers used to extract features and long short-term memory layers used to model the sequence of frames to obtain the correct action.…”
Section: Related Workmentioning
confidence: 99%
“…While human arXiv:2012.09402v1 [cs.CV] 17 Dec 2020 perception typically involves inferring the physical attributes about the humans (detection [5,35,43,50], poses [3,4,8,25,28,41], shape [13,20,29,30], gaze [44] etc. ), interpreting humans involves reasoning about the finer details relating to human activity [6,24,27,48,49], behaviour [26,34], human-object visual relationship detection [23,33,36,37,39,40], and human-object interactions [23,32,33,36,37,39,40,42]. In this work, we investigate the problem of identifying Human-Object Interactions in videos.…”
Section: Introductionmentioning
confidence: 99%