2021
DOI: 10.1142/s0218001422550023
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Level Two-Stream Fusion-Based Spatio-Temporal Attention Model for Violence Detection and Localization

Abstract: Detection of violent human behavior is necessary for public safety and monitoring. However, it demands constant human observation and attention in human-based surveillance systems, which is a challenging task. Autonomous detection of violent human behavior is therefore essential for continuous uninterrupted video surveillance. In this paper, we propose a novel method for violence detection and localization in videos using the fusion of spatio-temporal features and attention model. The model consists of Fusion … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 39 publications
0
4
0
Order By: Relevance
“…4. Spatiotemporal attention network (Asad et al, 2022): This network integrates spatial and temporal information and introduces an attention mechanism to focus on crucial spatiotemporal regions in video sequences. In elderly behavior detection, the spatiotemporal attention network effectively captures spatial and temporal behavior patterns.…”
Section: Convolutional Neural Network (Cnn;ismail Et Al 2023)mentioning
confidence: 99%
“…4. Spatiotemporal attention network (Asad et al, 2022): This network integrates spatial and temporal information and introduces an attention mechanism to focus on crucial spatiotemporal regions in video sequences. In elderly behavior detection, the spatiotemporal attention network effectively captures spatial and temporal behavior patterns.…”
Section: Convolutional Neural Network (Cnn;ismail Et Al 2023)mentioning
confidence: 99%
“…Our work improves method [63] in that we both train a detection network for crowd counting. While training fully supervised detectors with bounding box annotations, we only train weakly supervised detectors with pointlevel annotations [64]. Unlike our method, which only focuses on the number of people, our goal is to predict the number of people and generate appropriately sized detection boxes.…”
Section: Related Workmentioning
confidence: 99%
“…Department of Automation at Shanghai Jiao Tong University in Shanghai, China, created monitoring and surveillance technologies that rely entirely on automated detection and analysis. Based on "Multi-Stream 3D latent feature clustering for abnormality detection in videos" connected with CCTV at key locations, the full systems have been published and documented in [2] and [3]. The "Multi-Level Two Stream Fusion based Spatio-temporal Attention Model for Violence Detection and Localization" is used by the second system.…”
Section: Introductionmentioning
confidence: 99%