2020
DOI: 10.1109/access.2020.2968054
|View full text |Cite
|
Sign up to set email alerts
|

Context-Aware Cross-Attention for Skeleton-Based Human Action Recognition

Abstract: Skeleton-based human action recognition is becoming popular due to its computational efficiency and robustness. Since not all skeleton joints are informative for action recognition, attention mechanisms are adopted to extract informative joints and suppress the influence of irrelevant ones. However, existing attention frameworks usually ignore helpful scenario context information. In this paper, we propose a cross-attention module that consists of a self-attention branch and a cross-attention branch for skelet… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(18 citation statements)
references
References 31 publications
0
18
0
Order By: Relevance
“…On NTU-RGB+D dataset, the model achieves the latest performance, which verifies the effectiveness of the model in behavior recognition tasks. DSSCA-SSLM [32] 74.9% TCN [33] 74.3% 83.10% GCA-LSTM [16] 76.1% 84.0% Skelemotion [34] 76.5% 84.7% Slowfastnet [12] 80.25% 93.74% St-gcn [18] 81.5% 88.3% LSTM-CNN [35] 82.9% 91.0% Two-stream CNN [36] 83.2% 89.3% DPRL+GCNN [19] 83.5% 89.8% Cross-attention [21] 84.2% 89.3% SV-GCN (ours) 85.51% 94.15% 9 Journal of Sensors…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…On NTU-RGB+D dataset, the model achieves the latest performance, which verifies the effectiveness of the model in behavior recognition tasks. DSSCA-SSLM [32] 74.9% TCN [33] 74.3% 83.10% GCA-LSTM [16] 76.1% 84.0% Skelemotion [34] 76.5% 84.7% Slowfastnet [12] 80.25% 93.74% St-gcn [18] 81.5% 88.3% LSTM-CNN [35] 82.9% 91.0% Two-stream CNN [36] 83.2% 89.3% DPRL+GCNN [19] 83.5% 89.8% Cross-attention [21] 84.2% 89.3% SV-GCN (ours) 85.51% 94.15% 9 Journal of Sensors…”
Section: Discussionmentioning
confidence: 99%
“…Action recognition methods based on multiple data sources, Fan et al [21] proposed context-aware crossattention for skeleton-based human action recognition, proposed a cross-attention module that can extract context information directly from original RGB video, and used it in action recognition methods based on skeleton data. However, this method can only use scene context information in the original video and cannot fully use all information in the original video and does not use GCN-based method to model skeleton data.…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, they capture deep contextualized information of the input sequence [18]. Fan et al [19] proposed a self-attention module, which helps to extract joint features with more information and highly related to the context information of the corresponding scene, so as to suppress the influence of irrelevant joint features. The experimental results show that the accuracy is 98.59% adding self-attention mechanism.…”
Section: Deep Learning Of Action Feature Presentation Based On Attention Mechanismmentioning
confidence: 99%
“…Human-object interaction is a very popular research area due to its wide applicability to, e.g., healthcare monitoring [9], assisted living [10], surveillance [11], motion sensing games [12], content-based video indexing and retrieval, etc., in the field of computer vision [13,14]. Thus, there is a need for a more reliable and accurate system.…”
Section: Introductionmentioning
confidence: 99%