Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatio-Temporal Graph Convolutional Network for Action Recognition

Papadopoulos, Konstantinos; Ghorbel, Enjie; Aouada, Djamila; Ottersten, Björn

doi:10.1109/icpr48806.2021.9413189

Cited by 12 publications

(8 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, the FGCN model outperforms Two-Stream Attention LSTM [61] by over 24% on both the cross-subject and cross-setup benchmarks. Our FGCN model outperforms the most recent methods, such as GVFE + AS-GCN [63] and ST-TR [66] on both the cross-subject and cross-setup benchmarks of the NTU-RGB+D120 dataset.…”

Section: Modelsmentioning

confidence: 85%

See 1 more Smart Citation

Feedback Graph Convolutional Network for Skeleton-Based Action Recognition

Yang

Duan

Zhang

et al. 2022

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Skeleton-based action recognition has attracted considerable attention since the skeleton data is more robust to the dynamic circumstances and complicated backgrounds than other modalities. Recently, many researchers have used the Graph Convolutional Network (GCN) to model spatial-temporal features of skeleton sequences by an end-to-end optimization. However, conventional GCNs are feedforward networks for which it is impossible for the shallower layers to access semantic information in the high-level layers. In this paper, we propose a novel network, named Feedback Graph Convolutional Network (FGCN). This is the first work that introduces a feedback mechanism into GCNs for action recognition. Compared with conventional GCNs, FGCN has the following advantages:(1) A multi-stage temporal sampling strategy is designed to extract spatial-temporal features for action recognition in a coarse to fine process; (2) A Feedback Graph Convolutional Block (FGCB) is proposed to introduce dense feedback connections into the GCNs. It transmits the high-level semantic features to the shallower layers and conveys temporal information stage by stage to model video level spatial-temporal features for action recognition; (3) The FGCN model provides predictions on-the-fly. In the early stages, its predictions are relatively coarse. These coarse predictions are treated as priors to guide the feature learning in later stages, to obtain more accurate predictions. Extensive experiments on three datasets, NTU-RGB+D, NTU-RGB+D120 and Northwestern-UCLA, demonstrate that the proposed FGCN is effective for action recognition. It achieves the state-of-the-art performance on all three datasets.

show abstract

Section: Modelsmentioning

confidence: 85%

“…When FGCN is fed with more observations of actions in the subsequent stages, it gets higher accuracies. [64] 67.9 62.8 Shift-GCN (2-stream) (CVPR 2020) [65] 85.3 86.6 Shift-GCN (4-stream) (CVPR 2020) [65] 85.9 87.6 ST-TR (CVIU 2021) [66] 85.1 87.1 GVFE + AS-GCN (ICPR 2021) [63] 79.2 81.2 FGCN (ours) 85.4 87.4…”

Section: Modelsmentioning

confidence: 99%

Feedback Graph Convolutional Network for Skeleton-Based Action Recognition

Yang

Duan

Zhang

et al. 2022

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

show abstract

“…Instead of only use raw skeleton features (joint coordinates and/or bone lengths) like all above GCN‐based methods to construct the spatial‐temporal graphs, Papadopoulos et al. [58] introduced a more compact and efficient graph‐based framework to solve their certain limitations. It includes two modules, the Graph Vertex Feature Encoder (GVFE) module (Figure 10) learned appropriate vertex features by encoding raw skeleton data into a new feature space and the Dilated Hierarchical Temporal Convolutional Network (DH‐TCN) module was capable of capturing both short‐term and long‐term temporal dependencies using a hierarchical dilated convolutional network.…”

Section: Deep Learning‐based Action Recognition With 3d Skeletonmentioning

confidence: 99%

Deep learning‐based action recognition with 3D skeleton: A survey

Xing

Zhu

2021

CAAI Trans on Intel Tech

View full text Add to dashboard Cite

Action recognition based on 3D skeleton data has attracted much attention due to its wide application, and it is one of the most popular research topics in computer vision. The 3D skeleton data is an effective representation of motion dynamics and is not easily affected by light, scene variation, etc. Previous research on action recognition has mainly focused on video or RGB data methods. In recent years, the advantages of combining skeleton data and deep learning have been gradually demonstrated, many impressive methods have been proposed, especially GCN-based methods. In this survey, we first introduce the development process of 3D skeleton-data action recognition and the classification of graph convolutional network, then introduce the commonly used NTU RGB + D and NTU RGB + D 120 datasets. Finally, a detailed review of existing variants of three mainstream technologies is provided based on deep learning and their performance was compared from three dimensions. To the best of our knowledge, this is the first research to integrate the research of GCN-based method and its various evolutionary methods. Comparative investigation of existing variants of research in action-recognition task from different perspectives is made, a generic framework is described, state-of-theart practices are summarized, and the emerging trends of this topic are explored. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

show abstract

“…Cai et al [1] proposes to add flow patches to handle subtle movements into a GCN. Approaches based on GCN [6,23,37,47] have been constantly improving the state-of-the-art on skeletonbased action recognition recently.…”

Section: Skeleton-based Action Recognitionmentioning

confidence: 99%

Fusion-GCN: Multimodal Action Recognition using Graph Convolutional Networks

Michael¹,

Memmesheimer²,

Paulus³

2021

Preprint

View full text Add to dashboard Cite

In this paper we present Fusion-GCN, an approach for multimodal action recognition using Graph Convolutional Networks (GCNs). Action recognition methods based around GCNs recently yielded state-of-the-art performance for skeleton-based action recognition. With Fusion-GCN, we propose to integrate various sensor data modalities into a graph that is trained using a GCN model for multi-modal action recognition. Additional sensor measurements are incorporated into the graph representation either on a channel dimension (introducing additional node attributes) or spatial dimension (introducing new nodes). Fusion-GCN was evaluated on two publicly available datasets, the UTD-MHAD-and MMACT datasets, and demonstrates flexible fusion of RGB sequences, inertial measurements and skeleton sequences. Our approach gets comparable results on the UTD-MHAD dataset and improves the baseline on the large-scale MMACT dataset by a significant margin of up to 12.37% (F1-Measure) with the fusion of skeleton estimates and accelerometer measurements.

show abstract

Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatio-Temporal Graph Convolutional Network for Action Recognition

Cited by 12 publications

References 39 publications

Feedback Graph Convolutional Network for Skeleton-Based Action Recognition

Feedback Graph Convolutional Network for Skeleton-Based Action Recognition

Deep learning‐based action recognition with 3D skeleton: A survey

Fusion-GCN: Multimodal Action Recognition using Graph Convolutional Networks

Contact Info

Product

Resources

About