Modelling pixels using mixture of Gaussian distributions is a popular approach for removing background in video sequences. This approach works well for static backgrounds because the pixels are assumed to be independent of each other. However, when the background is dynamic, this is not very effective. In this paper, we propose a generalisation of the algorithm where the spatial relationship between pixels is taken into account. In essence, we model regions as mixture distributions rather than individual pixels. Using experimental verification on various video sequences, we show that our method is able to model and subtract backgrounds effectively in scenes with complex dynamic textures.
One of the most widely used techniques in computer vision for foreground detection is to model each background pixel as a Mixture of Gaussians (MoG).While this is effective for a static camera with a fixed or a slowly varying background, it fails to handle any fast, dynamic movement in the background. In this paper, we propose a generalised framework, called region- RMoG outperforms the others in reducing false positives whilst still maintaining reasonable foreground definition. Lastly, using the ChangeDetection (CDNet 2014) benchmark, we evaluated RMoG against numerous surveillance scenes and found it to amongst the leading performers for dynamic background scenes, whilst providing comparable performance for other commonly occurring surveillance scenes.
This paper introduces a momentum-like regularisation term for the region-based Mixture of Gaussians framework. Momentum term has long been used in machine learning, especially in backpropagation algorithms to improve the speed of convergence and subsequently their performance. Here, we prove the convergence of the online gradient method with a momentum term and apply it to background modelling by using it in the update equations of the region-based Mixture of Gaussians algorithm. It is then shown with the help of experimental evaluation on both simulated data and well known video sequences that these regularised updates help improve the performance of the algorithm.
This paper presents a new framework for multi-subject event inference in surveillance video, where measurements produced by low-level vision analytics usually are noisy, incomplete or incorrect.Our goal is to infer the composite events undertaken by each subject from noise observations. To achieve this, we consider the temporal characteristics of event relations and propose a method to correctly associate the detected events with individual subjects. The Dempster-Shafer (DS) theory of belief functions is used to infer events of interest from the results of our vision analytics and to measure conflicts occurring during the event association. Our system is evaluated against a number of videos that present passenger behaviours on a public transport platform namely buses at different levels of complexity. The experimental results demonstrate that by reasoning with spatio-temporal correlations, the proposed method achieves a satisfying performance when associating atomic events and recognising composite events involving multiple subjects in dynamic environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.