2019
DOI: 10.1109/access.2019.2954540
|View full text |Cite
|
Sign up to set email alerts
|

Spatio-Temporal Unity Networking for Video Anomaly Detection

Abstract: Anomaly detection in video surveillance is challenging due to the variety of anomaly types and definitions, which limit the use of supervised techniques. As such, auto-encoder structures, a type of classical unsupervised method, have recently been utilized in this field. These structures consist of an encoder followed by a decoder and are typically adopted to restructure a current input frame or predict a future frame. However, regardless of whether a 2D or 3D autoencoder structure is adopted, only single-scal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
26
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 49 publications
(26 citation statements)
references
References 20 publications
0
26
0
Order By: Relevance
“…The input pattern of our model is different from the current methods that stack T sequential frames together into the model. Among these methods, the T frames are linked to each corresponding channel in the first output feature data, resulting in the collapse of temporal information [ 22 ]. Thus, we feed T frames into the encoder orderly to generate corresponding feature maps.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…The input pattern of our model is different from the current methods that stack T sequential frames together into the model. Among these methods, the T frames are linked to each corresponding channel in the first output feature data, resulting in the collapse of temporal information [ 22 ]. Thus, we feed T frames into the encoder orderly to generate corresponding feature maps.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…The input mode of our network is different from existing methods that conventionally stack T consecutive frames together into a network. In these methods, all the T frames are connected to each channel in the first output feature map, which results in the collapse of temporal information [ 29 ]; thus, we input T frames into the encoder network one by one to generate corresponding feature maps. As shown in Figure 4 , the DB-ConvLSTM structure includes a shallow forward layer and a deeper backward layer.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…During testing, two networks take each frame as input and the outputs jointly determine whether it is novel or not. Li et al [29] propose a new anomaly score function and a spatio-temporal framework combined by U-net and adversarial learning. Similarly, Dong et al [30] propose a new approach with a dual discriminator-based generative adversarial network and U-net structure.…”
Section: B Aae-based Methodsmentioning
confidence: 99%