2022
DOI: 10.3390/s22124502
|View full text |Cite
|
Sign up to set email alerts
|

Weakly Supervised Violence Detection in Surveillance Video

Abstract: Automatic violence detection in video surveillance is essential for social and personal security. Monitoring the large number of surveillance cameras used in public and private areas is challenging for human operators. The manual nature of this task significantly increases the possibility of ignoring important events due to human limitations when paying attention to multiple targets at a time. Researchers have proposed several methods to detect violent events automatically to overcome this problem. So far, mos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 18 publications
(5 citation statements)
references
References 84 publications
0
5
0
Order By: Relevance
“…It should be noted that DenseNet uses multi-layer feature concatenation for improved feature representation, but this approach requires more GPU memory and longer training times. Choqueluque-Roman et al [105] followed an approach that used an I3D architecture in combination with a ResNet50 for feature extraction using human action tubes for training a deep learning model based on MIL. Their results showed that, according to the accuracy and AUC metrics, our models achieved better performance with relatively fewer model parameters, which confirms that training based on MIL may not achieve high classification accuracy.…”
Section: Methodsmentioning
confidence: 99%
“…It should be noted that DenseNet uses multi-layer feature concatenation for improved feature representation, but this approach requires more GPU memory and longer training times. Choqueluque-Roman et al [105] followed an approach that used an I3D architecture in combination with a ResNet50 for feature extraction using human action tubes for training a deep learning model based on MIL. Their results showed that, according to the accuracy and AUC metrics, our models achieved better performance with relatively fewer model parameters, which confirms that training based on MIL may not achieve high classification accuracy.…”
Section: Methodsmentioning
confidence: 99%
“…Similarly, a conventional method was proposed in [13], utilizing motion cues derived from optical flow using RGB frames and incorporating appearance as low-level features. The system coded these qualities into a bag of words (BoWs) to eliminate redundant information, which eventually discovered violent behavior.…”
Section: A Machine Learning-based Vd Approachesmentioning
confidence: 99%
“…The system is used for the detection of crimes. In [18], the authors have proposed a weakly supervised method to detect spatial and temporal actions that are violent in the videos. They have used the fast -RCNN architecture that extracts the spatiotemporal information.…”
Section: Literature Reviewmentioning
confidence: 99%