This paper presents an adaptive crowd counting system for video surveillance applications. The proposed method is composed of a pair of collaborative Gaussian process models (GP) with different kernels, which are designed to count people by taking the level of occlusion into account. The level of occlusion is measured and compared with a predefined threshold for regression model selection for each frame. In addition, the proposed method dynamically identifies the best combination of features for people counting. The Mall and UCSD datasets are used to evaluate the proposed method. The results show that the proposed method offers a higher accuracy when compared against state of the art methods reported in open literature. The mean absolute error (MAE), mean squared error (MSE) and the mean deviation error (MDE) for the proposed algorithm are 2.90, 13.70 and 0.095, respectively, for the Mall dataset and 1.63, 4.32 and 0.066, respectively, for UCSD dataset.
Closed circuit television cameras (CCTV) are widely used in monitoring. This paper presents an intelligent CCTV crowd counting system based on two algorithms that estimate the density of each pixel in each frame and use it as a basis for counting people. One algorithm uses scale-invariant feature transform (SIFT) features and clustering to represent pixels of frames (SIFT algorithm) and the other uses features from accelerated segment test (FAST) corner points with SIFT features (SIFT-FAST algorithm). Each algorithm is designed using a novel combination of pixel-wise, motion-region, grid map, background segmentation using Gaussian mixture model (GMM) and edge detection. A fusion technique is proposed and used to validate the accuracy by combining the result of the algorithms at frame level. The proposed system is more practical than the state of the art regression methods because it is trained with a small number of frames so it is relatively easy to deploy. In addition, it reduces the training error, setup time, cost and open the door to develop more accurate people detection methods. The University of California (UCSD) and Mall datasets have been used to test the proposed algorithms. The mean deviation error, mean squared error and the mean absolute error of the proposed system are less than 0.1, 16.5 and 3.1, respectively, for the Mall dataset and less than 0.07, 5.5 and 1.9, respectively, for UCSD dataset.
<span>Automatic speech recognition (ASR) is a technology that allows a computer and mobile device to recognize and translate spoken language into text. ASR systems often produce poor accuracy for the noisy speech signal. Therefore, this research proposed an ensemble technique that does not rely on a single filter for perfect noise reduction but incorporates information from multiple noise reduction filters to improve the final ASR accuracy. The main factor of this technique is the generation of K-copies of the speech signal using three noise reduction filters. The speech features of these copies differ slightly in order to extract different texts from them when processed by the ASR system. Thus, the best among these texts can be elected as final ASR output. The ensemble technique was compared with three related current noise reduction techniques in terms of CER and WER. The test results were encouraging and showed a relatively decreased by 16.61% and 11.54% on CER and WER compared with the best current technique. ASR field will benefit from the contribution of this research to increase the recognition accuracy of a human speech in the presence of background noise.</span>
Abstract-Different technologies are employed to detect and count people in various situations but crowd counting system based on computer vision is one of the best choices due to a number of advantages. These include accuracy, flexibility, cost and acquiring people distribution information. Crowd counting system based on computer vision can use closed circuit television cameras (CCTV) that have already become ubiquitous and their uses are increasing. This paper aims to develop crowd counting system that can be incorporated with existing CCTV cameras. In this paper, the extracted low-level features in a frame-to-frame analysis are processed using regression technique to estimate the number of people. Two complex scenes and environments are used to evaluate the performances of the proposed system. The results have shown that the proposed system can achieve good performance in terms of the mean absolute error (MAE) and mean squared error (MSE).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.