The exponential rise in technology and allied applications has always revitalized academia industries to achieve more efficient and robust solution to meet contemporary demands. Surveillance systems have always been the dominant area which has grabbed the attention of the scientific community to enable real-time events or target's characterization to make timely decision process. Crowd behavior analysis and classification is one of the most sought, though complex system to meet at hand surveillance purposes. However, unlike pedestrian movement detection methods, crowd analysis and behavioral characterization require robust feature learning and classification. With this motive, in this paper, a highly robust model is developed by applying hybrid deep features containing statistical features of the gray-level co-occurrence matrix (GLCM) and transferable deep learning AlexNet high-dimensional features. In addition, to perform multi-class classification multifeed forward neural network model (MFNN) is used. Here, the inclusion of hybrid features of GLCM and AlexNet provides deep spatio-temporal feature information which helps in making optimal classification decision. On the other hand, the use of MFNN algorithm enables optimal multi-class classification. Thus, the combined model with hybrid deep features and MFNN achieves crowd behavior classification with 91.35% accuracy, 89.92% precision, 88.34% recall and F-measure of 89.12%. KeywordsCrowd behavior analysis and classification • Hybrid deep features • GLCM • AlexNet • Fully connected layer (FC) • Deep learning • Multi-feed forward neural network (MFNN) • F-measure This article is part of the topical collection "Data Science and Communication" guest-edited by Kamesh Namudri, Naveen Chilamkurti, Sushma S. J. and S. Padmashree.
<abstract> <p>The high pace emergence in advanced software systems, low-cost hardware and decentralized cloud computing technologies have broadened the horizon for vision-based surveillance, monitoring and control. However, complex and inferior feature learning over visual artefacts or video streams, especially under extreme conditions confine majority of the at-hand vision-based crowd analysis and classification systems. Retrieving event-sensitive or crowd-type sensitive spatio-temporal features for the different crowd types under extreme conditions is a highly complex task. Consequently, it results in lower accuracy and hence low reliability that confines existing methods for real-time crowd analysis. Despite numerous efforts in vision-based approaches, the lack of acoustic cues often creates ambiguity in crowd classification. On the other hand, the strategic amalgamation of audio-visual features can enable accurate and reliable crowd analysis and classification. Considering it as motivation, in this research a novel audio-visual multi-modality driven hybrid feature learning model is developed for crowd analysis and classification. In this work, a hybrid feature extraction model was applied to extract deep spatio-temporal features by using Gray-Level Co-occurrence Metrics (GLCM) and AlexNet transferrable learning model. Once extracting the different GLCM features and AlexNet deep features, horizontal concatenation was done to fuse the different feature sets. Similarly, for acoustic feature extraction, the audio samples (from the input video) were processed for static (fixed size) sampling, pre-emphasis, block framing and Hann windowing, followed by acoustic feature extraction like GTCC, GTCC-Delta, GTCC-Delta-Delta, MFCC, Spectral Entropy, Spectral Flux, Spectral Slope and Harmonics to Noise Ratio (HNR). Finally, the extracted audio-visual features were fused to yield a composite multi-modal feature set, which is processed for classification using the random forest ensemble classifier. The multi-class classification yields a crowd-classification accurac12529y of (98.26%), precision (98.89%), sensitivity (94.82%), specificity (95.57%), and F-Measure of 98.84%. The robustness of the proposed multi-modality-based crowd analysis model confirms its suitability towards real-world crowd detection and classification tasks.</p> </abstract>
The emergences in computing technologies have broadened the horizon for vision-based surveillance, monitoring and control. However, complex and inferior feature learning over visual artefacts or video streams, especially under extreme conditions confine majority of the at-hand vision-based crowd analysis and classification systems. Retrieving event-sensitive or crowd-type sensitive spatio-temporal features for the different crowd types under extreme conditions is highly complex task. Despite numerous efforts in vision-based approaches, the lack of acoustic cues often create ambiguity in crowd classification. In this research, a novel audio-based feature learning model is developed for crowd analysis and classification. In this work, the audio samples (from the input video) were processed for static (fixed size) sampling, pre-emphasis, block framing and Hann windowing, followed by acoustic feature extraction like GTCC, GTCC-Delta, GTCC-Delta-Delta, MFCC, Spectral Entropy, Spectral Flux, Spectral Slope and Harmonics to noise Ratio (HNR). Finally, the extractedacousticfeatures were processed for classification using the random forest ensemble classifier. The audio-basedclassification model yield classification accuracy of 92.67%, precision of 93.80%, sensitivity82.91%, specificity of 90.48% and F-Measure of 0.9239.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.