2021
DOI: 10.1109/access.2021.3131315
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning for Automatic Violence Detection: Tests on the AIRTLab Dataset

Abstract: Following the growing availability of video surveillance cameras and the need for techniques to automatically identify events in video footages, there is an increasing interest towards automatic violence detection in videos. Deep learning-based architectures, such as 3D Convolutional Neural Networks, demonstrated their capability of extracting spatio-temporal features from videos, being effective in violence detection. However, friendly behaviours or fast moves such as hugs, small hits, claps, high fives, etc.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 51 publications
(20 citation statements)
references
References 31 publications
0
20
0
Order By: Relevance
“…ROC curves for Hockey (top row), Crowd (second row), and AIRTLab (third row) are extracted from Fold 5, Fold 3, and Fold 2, respectively. ViF [4] 0.8801 OViF [5] 0.9193 DiMOLIF [29] 0.9323 LBP+GLCM [3] 0.9360 HOMO [7] 0.9518 3D CNN [1] 0.970 MoWLD [6] 0.9758 LHOG+LHOF [8] 0.9798 C3D+SVM [20] 0.9962 C3D+FC [20] 0.9927 ConvLSTM [20] 0.9931 Ours (Transformer) 0.954 Ours (LSTM) 0.976 Crowd HOMO [7] 0.8284 ViF [4] 0.8804 DiMOLIF [29] 0.8925 OViF [5] 0.9182 LBP+GLCM [3] 0.93 MoWLD [6] 0.9408 ConvLSTM [20] 0.9443 LHOG+LHOF [8] 0.9703 3D CNN [1] 0.98 C3D+FC [20] 0.9994 C3D+SVM [20] 1 Ours (Transformer) 0.916 Ours (LSTM) 0.934 AIRTLab C3D+SVM [20] 0.993 C3D+FC [20] 0.9894 ConvLSTM [20] 0.9967 Ours (Transformer) 0.79 Ours (LSTM) 0.86…”
Section: Results and Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…ROC curves for Hockey (top row), Crowd (second row), and AIRTLab (third row) are extracted from Fold 5, Fold 3, and Fold 2, respectively. ViF [4] 0.8801 OViF [5] 0.9193 DiMOLIF [29] 0.9323 LBP+GLCM [3] 0.9360 HOMO [7] 0.9518 3D CNN [1] 0.970 MoWLD [6] 0.9758 LHOG+LHOF [8] 0.9798 C3D+SVM [20] 0.9962 C3D+FC [20] 0.9927 ConvLSTM [20] 0.9931 Ours (Transformer) 0.954 Ours (LSTM) 0.976 Crowd HOMO [7] 0.8284 ViF [4] 0.8804 DiMOLIF [29] 0.8925 OViF [5] 0.9182 LBP+GLCM [3] 0.93 MoWLD [6] 0.9408 ConvLSTM [20] 0.9443 LHOG+LHOF [8] 0.9703 3D CNN [1] 0.98 C3D+FC [20] 0.9994 C3D+SVM [20] 1 Ours (Transformer) 0.916 Ours (LSTM) 0.934 AIRTLab C3D+SVM [20] 0.993 C3D+FC [20] 0.9894 ConvLSTM [20] 0.9967 Ours (Transformer) 0.79 Ours (LSTM) 0.86…”
Section: Results and Analysismentioning
confidence: 99%
“…MoWLD outperforms our LSTM, but only marginally. The top models in Hockey, Crowd, and AIRTLab are still C3D [20] and ConvLSTM [20] according to the table.…”
Section: Results and Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…In their study, P. Sernani et al [97] proposed the AIRTLab dataset, which contains videos showing violence patterns performed by non-professional actors. They studied the use of 2D and 3D deep learning architectures for violence detection using their dataset and found that the studied models adapt well to their setting, where violence is mimicked by non-professional actors.…”
Section: A Datasets For Experimentsmentioning
confidence: 99%
“…More recently, the authors of [ 19 , 20 ] proposed the AIRTLab dataset, a small collection of 350 video clips labeled as “non-violent” and “violent,” where the non-violent actions include behaviors such as hugs and claps that can cause false positives in the violence detection task. Furthermore, the Surveillance Camera Fight dataset has been presented in [ 21 ].…”
Section: Related Workmentioning
confidence: 99%