Efficiency in Human Actions Recognition in Video Surveillance Using 3D CNN and DenseNet

Baca, Herwin Alayn Huillcen; Caceres, Juan Carlos Gutierrez; Valdivia, Flor de Luz Palomino

doi:10.1007/978-3-030-98012-2_26

Cited by 5 publications

(11 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Extracting spatiotemporal features from three consecutive frames can be applied to recognize human actions, as was proposed by Huillcen et al [ 29 ]. This method produces better efficiency results but still fails to surpass the state-of-the-art proposals in terms of effectiveness.…”

Section: Related Workmentioning

confidence: 99%

“…An improvement to the previous approach in terms of efficiency was presented by Huillcen et al [ 44 ]. It uses a DenseNet architecture but with different configurations of dense layers and dense blocks to ensure the compactness of the model.…”

Section: Related Workmentioning

confidence: 99%

“…It uses a DenseNet architecture but with different configurations of dense layers and dense blocks to ensure the compactness of the model. Later, Huillcen et al [ 29 ] presented a new proposal based on extracting spatiotemporal features using a 2D CNN and extracting regions of interest to ensure model compactness.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Efficient Human Violence Recognition for Surveillance in Real Time

Huillcen Baca,

Palomino Valdivia,

Gutierrez Caceres

2024

Sensors

Self Cite

View full text Add to dashboard Cite

Human violence recognition is an area of great interest in the scientific community due to its broad spectrum of applications, especially in video surveillance systems, because detecting violence in real time can prevent criminal acts and save lives. The majority of existing proposals and studies focus on result precision, neglecting efficiency and practical implementations. Thus, in this work, we propose a model that is effective and efficient in recognizing human violence in real time. The proposed model consists of three modules: the Spatial Motion Extractor (SME) module, which extracts regions of interest from a frame; the Short Temporal Extractor (STE) module, which extracts temporal characteristics of rapid movements; and the Global Temporal Extractor (GTE) module, which is responsible for identifying long-lasting temporal features and fine-tuning the model. The proposal was evaluated for its efficiency, effectiveness, and ability to operate in real time. The results obtained on the Hockey, Movies, and RWF-2000 datasets demonstrated that this approach is highly efficient compared to various alternatives. In addition, the VioPeru dataset was created, which contains violent and non-violent videos captured by real video surveillance cameras in Peru, to validate the real-time applicability of the model. When tested on this dataset, the effectiveness of our model was superior to the best existing models.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Human Violence Recognition for Surveillance in Real Time

Huillcen Baca,

Palomino Valdivia,

Gutierrez Caceres

2024

Sensors

Self Cite

View full text Add to dashboard Cite

show abstract

“…The codebook is then fed into a Recurrent Neural Network (RNN) with an (LSTM) classifier for sequential input and classification. A new real-time model for recognising human violence using DL has been proposed [28]. The model is made up of two modules: a spatial attention module that identifies spatial features and regions of interest using frame difference between consecutive frames and morphological dilation and a temporal attention module that identifies temporal features by averaging the RGB channels to a single channel and inputting three frames into a 2D CNN backbone.…”

Section: Related Workmentioning

confidence: 99%

“…The methodologies mentioned in [5][6][7][8][9][17][18][19][20]22,25,26,28] faced a shared challenge concerning the integration of new models into the existing framework. These methods require the existing models to be retrained from scratch, resulting in substantial demands on computational resources and time.…”

mentioning

confidence: 99%

Novel Deep Feature Fusion Framework for Multi-Scenario Violence Detection

Jebur,

Hussein,

Hoomod

et al. 2023

Computers

View full text Add to dashboard Cite

Detecting violence in various scenarios is a difficult task that requires a high degree of generalisation. This includes fights in different environments such as schools, streets, and football stadiums. However, most current research on violence detection focuses on a single scenario, limiting its ability to generalise across multiple scenarios. To tackle this issue, this paper offers a new multi-scenario violence detection framework that operates in two environments: fighting in various locations and rugby stadiums. This framework has three main steps. Firstly, it uses transfer learning by employing three pre-trained models from the ImageNet dataset: Xception, Inception, and InceptionResNet. This approach enhances generalisation and prevents overfitting, as these models have already learned valuable features from a large and diverse dataset. Secondly, the framework combines features extracted from the three models through feature fusion, which improves feature representation and enhances performance. Lastly, the concatenation step combines the features of the first violence scenario with the second scenario to train a machine learning classifier, enabling the classifier to generalise across both scenarios. This concatenation framework is highly flexible, as it can incorporate multiple violence scenarios without requiring training from scratch with additional scenarios. The Fusion model, which incorporates feature fusion from multiple models, obtained an accuracy of 97.66% on the RLVS dataset and 92.89% on the Hockey dataset. The Concatenation model accomplished an accuracy of 97.64% on the RLVS and 92.41% on the Hockey datasets with just a single classifier. This is the first framework that allows for the classification of multiple violent scenarios within a single classifier. Furthermore, this framework is not limited to violence detection and can be adapted to different tasks.

show abstract