AbstractThe violence detection is mostly achieved through handcrafted feature descriptors, while some researchers have also employed deep learning-based representation models for violent activity recognition. Deep learning-based models have achieved encouraging results for fight activity recognition on benchmark data sets such as hockey and movies. However, these models have limitations in learning discriminating features for violence activity classification with abrupt camera motion. This research work investigated deep representation models using transfer learning for handling the issue of abrupt camera motion. Consequently, a novel deep multi-net (DMN) architecture based on AlexNet and GoogleNet is proposed for violence detection in videos. AlexNet and GoogleNet are top-ranked pre-trained models for image classification with distinct pre-learnt potential features. The fusion of these models can yield superior performance. The proposed DMN unleashed the integrated potential by concurrently coalescing both networks. The results confirmed that DMN outperformed state-of-the-art methods by learning finest discriminating features and achieved 99.82% and 100% accuracy on hockey and movies data sets, respectively. Moreover, DMN has faster learning capability i.e. 1.33 and 2.28 times faster than AlexNet and GoogleNet, which makes it an effective learning architecture on images and videos.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.