2018
DOI: 10.1109/access.2018.2817253
|View full text |Cite
|
Sign up to set email alerts
|

Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
40
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 95 publications
(40 citation statements)
references
References 32 publications
0
40
0
Order By: Relevance
“…Accuracy (Souly and Shah, 2016) [21] 85.10% (Wang et al, 2009) [16] 85.60% (Le et al, 2011) [1] 86.50% (Kovashka and Grauman, 2010) [17] 87.20% (Wang et al, 2011) [18] 89.10% (Weinzaepfel et al, 2015) [22] 90.50% (Abdulmunem et al, 2016) [20] 90.90% (Ravanbakhsh et al, 2015) [37] 88.10% (Wang et al, 2018) [39] 91.89% (Zhou et al, 2017) [28] 90.00% Basic method 85.30% Proposed method 92.00% UCF-11 Methods Accuracy (Hasan et al, 2014) [2] 54.50% (Liu et al, 2009) [43] 71.20% (Ikizler-Cinbis et al, 2010) [36] 75.20% (Wang et al, 2011) [18] 84.20% (Sharma et al, 2015) [27] 84.90% (Cho et al, 2014) [19] 88.00% (Ravanbakhsh et al, 2015) [37] 77.10% (Wang et al, 2018) [39] 98.76% (Gammulle et al, 2017) [38] 89.20% (Gilbert et al, 2017) [3] 86.70% Basic method 82.40% Proposed method 92.40%…”
Section: Ucf Sport Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Accuracy (Souly and Shah, 2016) [21] 85.10% (Wang et al, 2009) [16] 85.60% (Le et al, 2011) [1] 86.50% (Kovashka and Grauman, 2010) [17] 87.20% (Wang et al, 2011) [18] 89.10% (Weinzaepfel et al, 2015) [22] 90.50% (Abdulmunem et al, 2016) [20] 90.90% (Ravanbakhsh et al, 2015) [37] 88.10% (Wang et al, 2018) [39] 91.89% (Zhou et al, 2017) [28] 90.00% Basic method 85.30% Proposed method 92.00% UCF-11 Methods Accuracy (Hasan et al, 2014) [2] 54.50% (Liu et al, 2009) [43] 71.20% (Ikizler-Cinbis et al, 2010) [36] 75.20% (Wang et al, 2011) [18] 84.20% (Sharma et al, 2015) [27] 84.90% (Cho et al, 2014) [19] 88.00% (Ravanbakhsh et al, 2015) [37] 77.10% (Wang et al, 2018) [39] 98.76% (Gammulle et al, 2017) [38] 89.20% (Gilbert et al, 2017) [3] 86.70% Basic method 82.40% Proposed method 92.40%…”
Section: Ucf Sport Methodsmentioning
confidence: 99%
“…Wang et al [39] proposed a lightweight architecture for video action recognition, which consisted of a CNN, LSTM, and attention model. They used the convolution model to extract two kinds of features (semantic, spatial) for each frame, followed by FC-LSTM with their temporal-wise attention model.…”
Section: Introductionmentioning
confidence: 99%
“…The ConvLSTM network [124] was designed based on the LSTM by adding convolution operation, it can consider the correlation of time and space in the video at the same time, and effectively integrate time and space features, and it has been applied in action recognition [125,126] and gesture recognition [127,128]. Courtney and Sreenivas [129] replaced the convolution layer in ResNet with the ConvLSTM layer to expand the learning time and space characteristics of ResNet structure and used CTC to train the network.…”
Section: Other Deep Learning Networkmentioning
confidence: 99%
“…The success of the deep learning methods in the image processing tasks [15] and action recognition [15], [16], [18] task, motivated the researchers to apply these methods in the case of the abnormal activity recognition. Consequently, a deep auto-encoder based approach [5]- [7] has been proposed to learn the features of the normal activities automatically, but generalization of these methods for real-world scenarios is difficult.…”
Section: Related Workmentioning
confidence: 99%