A Self-Attention Augmented Graph Convolutional Clustering Networks for Skeleton-Based Video Anomaly Behavior Detection

Liu, Chengming; Fu, Ronghua; Li, Yinghao; Gao, Yufei; Shi, Lei; Li, Weiwei

doi:10.3390/app12010004

Cited by 15 publications

(11 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the fine-tuning step, the entire network was fine-tuned using a multi-objective loss function, composed of reconstruction loss, prototype generation loss and cluster loss. Later, Liu et al [38] used self-attention augmented graph convolutions for detecting abnormal human behaviours based on skeleton graphs. Skeleton graphs were fed as input to a spatio-temporal selfattention augmented GCAE and latent features were extracted from the encoder part of the trained GCAE.…”

Section: Combinations Of Learning Approachesmentioning

confidence: 99%

Skeletal Video Anomaly Detection using Deep Learning: Survey, Challenges and Future Directions

Mishra¹,

Mihailidis²,

Khan³

2023

Preprint

View full text Add to dashboard Cite

The existing methods for video anomaly detection mostly utilize videos containing identifiable facial and appearance-based features. The use of videos with identifiable faces raises privacy concerns, especially when used in a hospital or community-based setting. Appearance-based features can also be sensitive to pixel-based noise, straining the anomaly detection methods to model the changes in the background and making it difficult to focus on the actions of humans in the foreground. Structural information in the form of skeletons describing the human motion in the videos is privacy-protecting and can overcome some of the problems posed by appearance-based features. In this paper, we present a survey of privacy-protecting deep learning anomaly detection methods using skeletons extracted from videos. We present a novel taxonomy of algorithms based on the various learning approaches. We conclude that skeleton-based approaches for anomaly detection can be a plausible privacyprotecting alternative for video anomaly detection. Lastly, we identify major open research questions and provide guidelines to address them.

show abstract

Section: Combinations Of Learning Approachesmentioning

confidence: 99%

Skeletal Video Anomaly Detection using Deep Learning: Survey, Challenges and Future Directions

Mishra¹,

Mihailidis²,

Khan³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Markovitz [2] used GCN to learn skeletal joint dependencies for representing behavioural features and performed clustering through soft assignments. Liu [3] extracted local and global features of the skeleton and generated latent vectors for clustering. Luo [24] stacked multilayer ST‐GCN and introduced the Resnet mechanism to detect anomalies by calculating the mean squared error between predicted joints and ground truth.…”

Section: Related Workmentioning

confidence: 99%

“…These methods are mostly considered as unsupervised learning and can be divided into two types of methods. The first method extracts features from the original video and then uses clustering [1][2][3] or classifiers [4,5] for anomaly detection. The second method reconstructs or predicts the input sequence and calculate the error between the real sequence and the generated sequence.…”

Section: Introductionmentioning

confidence: 99%

“…Subsequently, many studies have improved ST‐GCN. Similar to skeleton‐based action recognition methods [23], most studies [2–5, 16, 24–26] modelled the spatio‐temporal information of motion targets using the ST‐GCN. Although GCN has strong spatial feature learning capability, it tends to ignore global dependencies, which are crucial for behavioural understanding.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder embedded pose for anomaly detection

Zhu,

Wei,

2023

IET Computer Vision

View full text Add to dashboard Cite

Due to the robustness of skeleton data to human scale, illumination changes, dynamic camera views, and complex backgrounds, great progress has been made in skeleton‐based video anomaly detection in recent years. The spatio‐temporal graph convolutional network has been proven to be effective in modelling the spatio‐temporal dependencies of non‐Euclidean data such as human skeleton graphs, and the autoencoder based on this basic unit is widely used to model sequence features. However, due to the limitations of the convolution kernel, the model cannot capture the correlation between non‐adjacent joints, and it is difficult to deal with long‐term sequences, resulting in an insufficient understanding of behaviour. To address this issue, this paper applies the Transformer to the human skeleton and proposes the Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder (STEGT‐AE) to improve the capability of modelling. In addition, the multi‐memory model with skip connections is employed to provide different levels of coding features, thereby enhancing the ability of the model to distinguish similar heterogeneous behaviours. Furthermore, the STEGT‐AE has a single encoder‐double decoder architecture, which can improve the detection performance by the combining reconstruction and prediction error. The experimental results show that performances of STEGT‐AE is significantly better than other advanced algorithms on four baseline datasets.

show abstract

“…Figure 1(c The self-attention mechanism [24] is widely used in the field of object detection, it has not been applied to rice pest detection. Liu et al [25] proposed a video detection method for detecting abnormal human behavior. They used a spatial self-attention module, to understand the intra-frame relationship between various parts of the human body, and conducted experiments on large public dataset.…”

Section: Related Workmentioning

confidence: 99%

A Self-Attention Feature Fusion Model for Rice Pest Detection

Wang

Zhang

et al. 2022

IEEE Access

View full text Add to dashboard Cite

To address the problem that existing deep learning methods are not sufficiently accurate to detect rice pests with changeable shapes or similar appearances, a self-attention feature fusion model for rice pest detection (SAFFPest) was proposed. The model was based on VarifocalNet. First, a deformable convolution module was added to the feature extraction network, to improve the feature extraction ability of pests with changeable shapes. Second, by obtaining the balance features of multiple feature maps, the selfattention mechanism was introduced to refine the balance feature, in order to better restore the semantic information of some pests with similar appearances. Subsequently, the group normalization method was used to replace the batch normalization method in the original model, to reduce the impact of batch size on model training. The IP102 rice pest dataset was used to train and verify this model. The experimental results showed that the model can accurately detect nine kinds of rice pests, such as rice leaf rollers and rice leaf caterpillars. Compared with FasterRCNN, RetinaNet, CP-FCOS, VFNet and BiFA-YOLO, the mean average precision of the model improved by 33.7%, 6.5%, 4.5%, 2.9% and 2% respectively.

show abstract

A Self-Attention Augmented Graph Convolutional Clustering Networks for Skeleton-Based Video Anomaly Behavior Detection

Cited by 15 publications

References 29 publications

Skeletal Video Anomaly Detection using Deep Learning: Survey, Challenges and Future Directions

Skeletal Video Anomaly Detection using Deep Learning: Survey, Challenges and Future Directions

A Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder embedded pose for anomaly detection

A Self-Attention Feature Fusion Model for Rice Pest Detection

Contact Info

Product

Resources

About