Accurate and real-time video surveillance techniques for removing background variations in a video stream, which are highly correlated between frames, are at the forefront of modern data-analysis research. The objective in such algorithms is to highlight foreground objects of potential interest. Background/foreground separation is typically an integral step in detecting, identifying, tracking, and recognizing objects in video sequences. Most modern computer vision applications demand algorithms that can be implemented in real-time, and that are robust enough to handle diverse, complicated, and cluttered backgrounds. Competitive methods often need to be flexible enough to accommodate changes in a scene due to, for instance, illumination changes that can occur throughout the day, or location changes where the application is being implemented. Given the importance of this task, a variety of iterative techniques and methods have already been developed in order to perform background/foreground separation [4,8,11,15,23,24] (See also, for instance, the recent reviews by Bouwmans [2] and Benezeth et al. [1], which compare error and timing of various methods).One potential viewpoint of this computational task is as a matrix separation problem into low-rank (background) and sparse (foreground) components. Recently, this viewpoint has been advocated by Candès et al. in the framework of robust principal component analysis 19-1 (RPCA) [4]. By weighting a combination of the nuclear and the L 1 norms, a convenient convex optimization problem (principal component pursuit) was demonstrated, under suitable assumptions, to exactly recover the low-rank and sparse components of a given data-matrix (or video for our purposes). It was also compared to the state-of-the-art computer vision procedure developed by De La Torre and Black [10]. We advocate a similar matrix separation approach, but by using the method of dynamic mode decomposition (DMD) [5,17,[20][21][22]26] (see also Kutz [9] for a tutorial review). This method, which essentially implements a Fourier decomposition of correlated spatial activity of the video frames in time, distinguishes the stationary background from the dynamic foreground by differentiating between the nearzero Fourier modes and the remaining modes bounded away from the origin, respectively [7]. Originally introduced in the fluid mechanics community, DMD has emerged as a powerful tool for analyzing the dynamics of nonlinear systems [5,17,[20][21][22]26].In the application of video surveillance, the video frames can be thought of as snapshots of some underlying complex/nonlinear dynamics. The DMD decomposition yields oscillatory time components of the video frames that have contextual implications. Namely, those modes that are near the origin represent dynamics that are unchanging, or changing slowly, and can be interpreted as stationary background pixels, or low-rank components of the data matrix. In contrast, those modes bounded away from the origin are changing on O(1) timescales or faster, and represent ...
Gesture recognition is analyzed on a set of static hand gestures in the context of designing robust, real-time pre-processing techniques for applications in hand-held electronics. A comparative case study that uses various combinations of algorithms across the steps of the recognition process is made, revealing the fact that many method combinations can produce highly accurate results, even at low resolutions, given the right kind of pre-processing. The pre-processing includes the hand segmentation and normalization done before feature extraction. Indeed, pre-processing has by far the greatest effect on the overall accuracy, robustness, and speed of the gesture recognition process, significantly outweighing the influence of feature-extraction and classification. Even at image resolutions as low as 8×8 pixels, accuracies of 99% are achieved using a simple PCA feature selection scheme and a LDA classification method. These results suggest the priority and advantages of focusing on developing robust and efficient pre-processing methods.
Finding the best set of gestures to use for a given computer recognition problem is an essential part of optimizing the recognition performance while being mindful to those who may articulate the gestures. An objective function, called the ellipsoidal distance ratio metric (EDRM), for determining the best gestures from a larger lexicon library is presented, along with a numerical method for incorporating subjective preferences. In particular, we demonstrate an efficient algorithm that chooses the best n gestures from a lexicon of m gestures where typically n ! m using a weighting of both subjective and objective measures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.