2021
DOI: 10.1109/access.2020.3048741
|View full text |Cite
|
Sign up to set email alerts
|

Deep Neural Networks Using Capsule Networks and Skeleton-Based Attentions for Action Recognition

Abstract: This work develops Deep Neural Networks (DNNs) by adopting Capsule Networks (Cap-sNets) and spatiotemporal skeleton-based attention to effectively recognize subject actions from abundant spatial and temporal contexts of videos. The proposed generic DNN includes four 3D Convolutional Neural Networks (3D_CNNs), Attention-Jointed Appearance (AJA) and Attention-Jointed Motion (AJM) generation layers, two Reduction Layers (RLs), two Attention-based Recurrent Neural Networks (A_RNNs), and an inference classifier, wh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
25
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(25 citation statements)
references
References 33 publications
0
25
0
Order By: Relevance
“…Sports Training. In the process of human motion capture, because the inertial sensor is self-sufficient and has no external reference point, it cannot obtain spatial displacement information, so it is necessary to use positioning technology to obtain displacement information in the process of human motion capture [20]. Wide-area positioning and short-range positioning are two types of wireless positioning technologies.…”
Section: Realization Of Body Movement Detection In Competitivementioning
confidence: 99%
“…Sports Training. In the process of human motion capture, because the inertial sensor is self-sufficient and has no external reference point, it cannot obtain spatial displacement information, so it is necessary to use positioning technology to obtain displacement information in the process of human motion capture [20]. Wide-area positioning and short-range positioning are two types of wireless positioning technologies.…”
Section: Realization Of Body Movement Detection In Competitivementioning
confidence: 99%
“…In order to improve performance in visual perception, several generations of CNNs have been created with the input vectors taking care of one image or multiple images. Particularly, multiple images are commonly adopted as an input vector which has the embedded temporal information as well as the spatial information [2], [4]. In addition to improving learning, many researchers used temporal networks to perform large-scale visual learning and activity classification from video clips, where temporal networks had recurrent connections to aid in video context understanding regarding time [2], [4]- [7].…”
Section: Introductionmentioning
confidence: 99%
“…The motion being performed can be at a fast-refreshing speed, and individual frames can be ambiguous. Therefore, motion cues provide a necessary approach by allowing the compensated optical flows to pick up potential [2], [4]. Another important reason is that current CNNs architectures are not able to take full advantage of temporal information and their performance is consequently often dominated by appearance recognition.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…On the other hand, DNN models have achieved human-level performance and have shown great success in different real-world applications, including computer vision [ 16 ], textile process, biomedical engineering [ 17 ], material engineering [ 18 ]. DNN is an efficient machine learning tool suitable for the prediction of output parameters from input variables where there is an unknown relationship exists between input and output variables [ 19 , 20 , 21 ]. In recent years, DNN has been widely used to predict various properties of textiles.…”
Section: Introductionmentioning
confidence: 99%