2022
DOI: 10.1007/s00530-022-00915-9
|View full text |Cite
|
Sign up to set email alerts
|

Predicting skeleton trajectories using a Skeleton-Transformer for video anomaly detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(10 citation statements)
references
References 28 publications
0
10
0
Order By: Relevance
“…From Table 1, it can be observed that the performance of our STEGT‐AE model on the ShanghaiTech and HR‐ST datasets surpasses several methods based on pixel‐level appearance features [6, 7, 10, 11, 20, 22, 39], optical flow features [17, 40], and their combination [9, 19]. Furthermore, as shown in Table 1, our model outperforms existing methods that utilise pose features [2, 13, 16, 24, 25, 41–43]. Specifically, compared to the ST‐GCN‐based prediction network Normal graph [24], our model achieves a 5.43% and 4.29% improvement in frame‐level AUC on the ShanghaiTech and HR‐ST datasets, respectively.…”
Section: Methodsmentioning
confidence: 80%
See 1 more Smart Citation
“…From Table 1, it can be observed that the performance of our STEGT‐AE model on the ShanghaiTech and HR‐ST datasets surpasses several methods based on pixel‐level appearance features [6, 7, 10, 11, 20, 22, 39], optical flow features [17, 40], and their combination [9, 19]. Furthermore, as shown in Table 1, our model outperforms existing methods that utilise pose features [2, 13, 16, 24, 25, 41–43]. Specifically, compared to the ST‐GCN‐based prediction network Normal graph [24], our model achieves a 5.43% and 4.29% improvement in frame‐level AUC on the ShanghaiTech and HR‐ST datasets, respectively.…”
Section: Methodsmentioning
confidence: 80%
“…Specifically, compared to the ST‐GCN‐based prediction network Normal graph [24], our model achieves a 5.43% and 4.29% improvement in frame‐level AUC on the ShanghaiTech and HR‐ST datasets, respectively. Compared to the Skeleton‐transformer method based on the Transformer model [43], the AUC value of our model is improved by 3.14% on the HR‐ST dataset. Compared to STGCAE‐LSTM [25], which also follows a single‐encoder dual‐decoder architecture, the frame‐level AUC values of our model are improved by 3.93% and 3.59% on the ShanghaiTech and HR‐ST datasets, respectively.…”
Section: Methodsmentioning
confidence: 97%
“…As can be seen in Table I and II, the existing skeletal video anomaly detection methods and available datasets focus towards detecting irregular body postures [16], and anomalous human actions [30] in mostly outdoor settings, and not in proper healthcare settings, such as personal homes and longterm care homes. This a gap towards real world deployment, as there is a need to extend the scope of detecting anomalous behaviours using skeletons to in-home and care home settings, where privacy is a very important concern.…”
Section: Discussionmentioning
confidence: 99%
“…Fan et al [29] proposed a GRU feedforward network that was trained to predict the next skeleton using past skeleton sequences and a loss function that incorporated the range and speed of the predicted skeletons. Pang et al [30] proposed a skeleton transformer to predict future pose components in video frames and considered error between predicted pose components and corresponding expected values as anomaly score. They applied a multi-head self-attention module to capture long-range dependencies between arbitrary pairwise pose components and the temporal convolutional layer to concentrate on local temporal information.…”
Section: B Prediction Approachesmentioning
confidence: 99%
“…In this section, the experimental results of the model in this paper are compared with the existing abnormal behavior detection models with the same indicator on three datasets. In the comparison models, unmasking [28] was the model to learn the efective classifer through sliding windows, the level set method [29] was the model that used horizontal set detection to extract image descriptors, SRNN [32] and sRNN-AE [35] used RNN as the basic model, GAN_pred [33] used GAN combined with U-Net, PST [36] used pose components for abnormal behavior detection, and others were variant models based on autoencoder and U-Net. Table 3 shows the results of the proposed model and the comparison models on three datasets.…”
Section: Quantitative Experimentsmentioning
confidence: 99%