2020
DOI: 10.1186/s40537-020-00365-y
|View full text |Cite
|
Sign up to set email alerts
|

Deep anomaly detection through visual attention in surveillance videos

Abstract: This paper describes a method for learning anomaly behavior in the video by finding an attention region from spatiotemporal information, in contrast to the full-frame learning. In our proposed method, a robust background subtraction (BG) for extracting motion, indicating the location of attention regions is employed. The resulting regions are finally fed into a three-dimensional Convolutional Neural Network (3D CNN). Specifically, by taking advantage of C3D (Convolution 3-dimensional), to completely exploit sp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 56 publications
(11 citation statements)
references
References 46 publications
0
10
0
1
Order By: Relevance
“…Contrary to most of the works using the LSTM or 3D CNN for temporal modeling, Jain and Vishwakarma aggregated multiple frames called Dynamic Image (DI) to represent motion features in a single frame and fed it into 2D CNN [66]. Background subtraction was also used in violence detection to reduce the influence of disturbing backgrounds [36]. Su et al used graph convolution for violence detection with an assistance of a pose estimation model [12].…”
Section: There Have Been Various Violence Detection Methods With 2dmentioning
confidence: 99%
See 1 more Smart Citation
“…Contrary to most of the works using the LSTM or 3D CNN for temporal modeling, Jain and Vishwakarma aggregated multiple frames called Dynamic Image (DI) to represent motion features in a single frame and fed it into 2D CNN [66]. Background subtraction was also used in violence detection to reduce the influence of disturbing backgrounds [36]. Su et al used graph convolution for violence detection with an assistance of a pose estimation model [12].…”
Section: There Have Been Various Violence Detection Methods With 2dmentioning
confidence: 99%
“…Kim et al computed the motion features with edge and color orientation histograms to generate a spatio-temporal saliency map [35]. Nasaruddin et al proposed background (BG) subtraction method to blur uninteresting areas in the surveillance video by using the binary bitmaps of each frame [36]. Some works utilized spatial and temporal attention modules in video action recognition to reduce redundant information over space and time [37]- [41].…”
Section: Introductionmentioning
confidence: 99%
“…Sebagaimana dijelaskan pada [15], ukuran blok 4x4 mampu menghasilkan informasi tekstur yang paling baik, jika dibandingkan dengan ukuran blok lain, yaitu ukuran 2x2 dan 6x6. Pada perbandingan tersebut, ukuran blok 2x2 menghilangkan terlalu banyak informasi tekstur, sedang blok 6x6 menampilkan detail yang tidak penting pada blok biner yang dihasilkan.…”
Section: A Block Truncation Coding (Btc)unclassified
“…Their model includes a ranking loss function and trains a fully connected neural network for decision-making. In a similar context, Nassarudin et al [29] presented a deep anomaly detection approach. They implemented a bilateral background subtraction, use the pretrained C3D model [24] for feature extraction, and attached a fully connected network to perform regression.…”
Section: Background and Related Workmentioning
confidence: 99%