2021
DOI: 10.1049/cvi2.12075
|View full text |Cite
|
Sign up to set email alerts
|

Multi‐stream adaptive spatial‐temporal attention graph convolutional network for skeleton‐based action recognition

Abstract: Skeleton-based action recognition algorithms have been widely applied to human action recognition. Graph convolutional networks (GCNs) generalize convolutional neural networks (CNNs) to non-Euclidean graphs and achieve significant performance in skeleton-based action recognition. However, existing GCN-based models have several issues, such as the topology of the graph is defined based on the natural skeleton of the human body, which is fixed during training, and it may not be applied to different layers of the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 32 publications
0
4
0
Order By: Relevance
“…Multiplying this kernel with the feature vector adjacent to the point gives the K K C  feature map, and finally the K K C  kernel is superimposed to obtain the final output feature map, with Involution generating different kernels for different locations and sharing a single kernel at the same location on the channel [12]. The traditional Convolution kernel counts and Involution counts are shown in Equation (1).…”
Section: Involution Feature Extraction Network Based Human Pose Recog...mentioning
confidence: 99%
See 1 more Smart Citation
“…Multiplying this kernel with the feature vector adjacent to the point gives the K K C  feature map, and finally the K K C  kernel is superimposed to obtain the final output feature map, with Involution generating different kernels for different locations and sharing a single kernel at the same location on the channel [12]. The traditional Convolution kernel counts and Involution counts are shown in Equation (1).…”
Section: Involution Feature Extraction Network Based Human Pose Recog...mentioning
confidence: 99%
“…In Equation (12), N is the number of nodes in the network and U is the set of all nodes in the network.…”
Section: ( )mentioning
confidence: 99%
“…EfficientGCN [35] considered joints, bone and velocity to increase the capacity of their model. Furthermore, joints motion and bones motion are added by [44] as extra streams in the feature learning pipeline. However, the view stream has not been formally proposed and is well designed in the field due to our best knowledge.…”
Section: Related Work 21 Skeleton-based Action Recognitionmentioning
confidence: 99%
“…For instance, some of their work consider the different scale of receptive field [6], capturing both the short-term trajectory and long-term trajectory [5], or design multi-scale ST-GCN [42]. Another group of trending approaches [32,35,44] leverage the capacity of GCN models by introducing multiple streams. They usually take joints (relative and absolute), velocity (joints motion and bones motion) and bones as inputs and fuse all the features together as the final representation.…”
Section: Introductionmentioning
confidence: 99%