MTT: Multi-Scale Temporal Transformer for Skeleton-Based Action Recognition

Kong, Jun; Bian, Yuhang; Jiang, Min

doi:10.1109/lsp.2022.3142675

Cited by 41 publications

(14 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…TEM selects multiple adjacent joints between frames, which helps to extract the relevant features of multiple adjacent joints connected in human motion. Multi-scale temporal transformer (MTT) 12 takes into account patterns at various time scales and designs multiple branches to extract various timescale features. MTT learns from skeleton sequences and models long-term time.…”

Section: Related Workmentioning

confidence: 99%

“…Compared with the previous convolutional neural networks (CNNs)-based methods [2][3][4][5] and recurrent neural networks (RNNs)-based methods, [6][7][8][9] Graph convolutional networks (GCNs) have good performance on any graph structure, and scholars are increasling using it for skeleton-based action recognition. [10][11][12][13][14][15][16] Yan et al first propose spatial-temporal GCN (ST-GCN) 17 to apply GCN to skeleton-based action recognition. According to the particularity of skeleton data, the relationship between different joints in the same dimension and the connection between joints in different dimensions are critical.…”

Section: Introductionmentioning

confidence: 99%

“…Compared with the previous convolutional neural networks (CNNs)-based methods 2 – 5 and recurrent neural networks (RNNs)-based methods, 6 – 9 Graph convolutional networks (GCNs) have good performance on any graph structure, and scholars are increasling using it for skeleton-based action recognition 10 – 16 Yan et al. first propose spatial-temporal GCN (ST-GCN) 17 to apply GCN to skeleton-based action recognition.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Feature difference and feature correlation learning mechanism for skeleton-based action recognition

2023

Self Cite

View full text Add to dashboard Cite

.In recent years, skeleton-based action recognition has become increasingly popular in the field of human action recognition, and graph convolutional networks (GCNs) have shown better advantages in this task. Many GCN-based methods are insufficient in the latent relationship between features, which affects the discriminability of features being not rich enough. These potential feature relationships can manifest as feature differences that change due to actions and feature correlations that interact with each other. Therefore, we propose a feature difference and feature correlation learning mechanism to learn discriminative augmentation features, including feature differences in actions and feature correlations between joints. First, we propose a temporal feature difference and correlation learning module (FDCL) (TFDCL). In adjacent temporal frames, we extract feature correlations between related parts. Feature differences are captured through changes in joints over the overall long-term timeline. Second, we propose a channel FDCL module. Different channels contain different types of features for actions. We use convolution operations to interact between channels, continuously extracting the strongest features to obtain feature maps. Third, we propose a temporal channel context topology (TCCT) module to dynamically learn global contextual features of all joints during motion. Finally, experiments are conducted on the NTU-RGBD 60 dataset and the kinetics-skeleton 400 dataset to verify the effectiveness of the network.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Feature difference and feature correlation learning mechanism for skeleton-based action recognition

2023

Self Cite

View full text Add to dashboard Cite

show abstract

“…They propose a hybrid model by integrating autoregressive and non-autoregressive models [8]. Transformer-based models can also be adapted to skeleton-based action recognition tasks [9], [10].…”

Section: Introductionmentioning

confidence: 99%

Fractional Fourier Transform Meets Transformer Encoder

Şahi̇nuç

Koç

2022

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Utilizing signal processing tools in deep learning models has been drawing increasing attention. Fourier transform (FT), one of the most popular signal processing tools, is employed in many deep learning models. Transformer-based sequential input processing models have also started to make use of FT. In the existing FNet model, it is shown that replacing the attention layer, which is computationally expensive, with FT accelerates model training without sacrificing task performances significantly. We further improve this idea by introducing the fractional Fourier transform (FrFT) into the transformer architecture. As a parameterized transform with a fraction order, FrFT provides an opportunity to access any intermediate domain between time and frequency and find better-performing transformation domains. According to the needs of downstream tasks, a suitable fractional order can be used in our proposed model FrFNet. Our experiments on downstream tasks show that FrFNet leads to performance improvements over the ordinary FNet. 1

show abstract

“…Today, there has been some research using a transformer-based multi-scale method on many applications. Kong et al [22] proposed a multi-scale temporal transformer for skeletonbased action recognition.Xiao et al [23] proposed a multi-scale spatiotemporal transformer to efficiently aggregate contextual information in long-time sequences of video frames. Yuan et al [24] proposed a multi-scale adaptive segmentation network based on Swin Transformer for remote sensing image segmentation.…”

Section: Introductionmentioning

confidence: 99%

TransMF: Transformer-Based Multi-Scale Fusion Model for Crack Detection

Zhao

Qian

2022

Mathematics

View full text Add to dashboard Cite

Cracks are widespread in infrastructure that are closely related to human activity. It is very popular to use artificial intelligence to detect cracks intelligently, which is known as crack detection. The noise in the background of crack images, discontinuity of cracks and other problems make the crack detection task a huge challenge. Although many approaches have been proposed, there are still two challenges: (1) cracks are long and complex in shape, making it difficult to capture long-range continuity; (2) most of the images in the crack dataset have noise, and it is difficult to detect only the cracks and ignore the noise. In this paper, we propose a novel method called Transformer-based Multi-scale Fusion Model (TransMF) for crack detection, including an Encoder Module (EM), Decoder Module (DM) and Fusion Module (FM). The Encoder Module uses a hybrid of convolution blocks and Swin Transformer block to model the long-range dependencies of different parts in a crack image from a local and global perspective. The Decoder Module is designed with symmetrical structure to the Encoder Module. In the Fusion Module, the output in each layer with unique scales of Encoder Module and Decoder Module are fused in the form of convolution, which can release the effect of background noise and strengthen the correlations between relevant context in order to enhance the crack detection. Finally, the output of each layer of the Fusion Module is concatenated to achieve the purpose of crack detection. Extensive experiments on three benchmark datasets (CrackLS315, CRKWH100 and DeepCrack) demonstrate that the proposed TransMF in this paper exceeds the best performance of present baselines.

show abstract

MTT: Multi-Scale Temporal Transformer for Skeleton-Based Action Recognition

Cited by 41 publications

References 26 publications

Feature difference and feature correlation learning mechanism for skeleton-based action recognition

Feature difference and feature correlation learning mechanism for skeleton-based action recognition

Fractional Fourier Transform Meets Transformer Encoder

TransMF: Transformer-Based Multi-Scale Fusion Model for Crack Detection

Contact Info

Product

Resources

About