2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.115
|View full text |Cite
|
Sign up to set email alerts
|

Ensemble Deep Learning for Skeleton-Based Action Recognition Using Temporal Sliding LSTM Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
193
1

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 375 publications
(215 citation statements)
references
References 20 publications
0
193
1
Order By: Relevance
“…In addition, since DD-net employs one-dimensional CNNs to extract the feature, it is much faster than other models that use RNNs [31], [22], [32], [25] or 2D/3D CNNs [5], [39], [7], [8], [28]. During its inferences, DD-Net's speed can reach around 3,500 FPS on one GPU (i.e., GTX 1080Ti), or, 2,000 FPS on one CPU (i.e., Intel E5-2620).…”
Section: Results Analysis and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, since DD-net employs one-dimensional CNNs to extract the feature, it is much faster than other models that use RNNs [31], [22], [32], [25] or 2D/3D CNNs [5], [39], [7], [8], [28]. During its inferences, DD-Net's speed can reach around 3,500 FPS on one GPU (i.e., GTX 1080Ti), or, 2,000 FPS on one CPU (i.e., Intel E5-2620).…”
Section: Results Analysis and Discussionmentioning
confidence: 99%
“…1 (c)), solely using the JCD feature is insufficient. Unlike previous works that only utilize either the geometric feature [18], [22] or the Cartesian coordinate feature [24], [25], [26], [27], our DD-Net seamlessly integrates both of them.…”
Section: B Modeling Global Scale-invariant Motions By a Two-scale Momentioning
confidence: 99%
“…Note that there are essential differences between the proposed action reasoning approach and many deep learning based action recognition methods [8,9,23,36,44,45]: (1) Instead of only predicting a single action label, our method outputs multiple action labels with relevant objects, attributes/relationships and the time of each state transition. (2) Our action models are learned from semanticlevel state transitions based definitions (state detectors are trained on still images), and thus it does not need well-annotated video clips for training.…”
Section: Action Recognition Accuracymentioning
confidence: 99%
“…It is worth mentioning that TSN achieves the state-of-the-art performance 94.9% and 89.6% on the benchmark action recognition dataset UCF 101 [38] and ActivityNet [6], respectively. Similar to the other popular action recognition methods [8,9,23,36,44], the output of TSN is only an action label for a video sequence. In our experiment, the TSN model is trained with 50 epochs for both appearance and optical flow models, and the best recognition results are reported for comparison.…”
Section: Action Recognition Accuracymentioning
confidence: 99%
“…For small human action recognition datasets, deep learning methods may not give the best performance. Recent Kinect-based human action recognition algorithms are: [14,18,[20][21][22][25][26][27][28][29][30][31][32][33][34][35][36][37][38]. Research contributions.…”
Section: Introductionmentioning
confidence: 99%