The development of vision-based human activity recognition and analysis systems has been a matter of great interest to both the research community and practitioners during the last 20 years. Traditional methods that require a human operator watching raw video streams are nowadays deemed as at least ineffective and expensive. New, smart solutions in automatic surveillance and monitoring have emerged, propelled by significant technological advances in the fields of image processing, artificial intelligence, electronics and optics, embedded computing and networking, molding the future of several applications that can benefit from them, like security and healthcare. The main motivation behind it is to exploit the highly informative visual data captured by cameras and perform highlevel inference in an automatic, ubiquitous and unobtrusive manner, so as to aid human operators, or even replace them. This survey attempts to comprehensively review the current research and development on vision-based human activity recognition. Synopses from various methodologies are presented in an effort to garner the advantages and shortcomings of the most recent state-of-the-art technologies. Also a first-level self-evaluation of methodologies is also proposed, which incorporates a set of significant features that best describe the most important aspects of each methodology in terms of operation, performance and others and weighted by their importance. The purpose of this study is to serve as a reference for further research and evaluation to raise thoughts and discussions for future improvements of each methodology towards maturity and usefulness.
Segmentation of human bodies in images is a challenging task that can facilitate numerous applications, like scene understanding and activity recognition. In order to cope with the highly dimensional pose space, scene complexity, and various human appearances, the majority of existing works require computationally complex training and template matching processes. We propose a bottom-up methodology for automatic extraction of human bodies from single images, in the case of almost upright poses in cluttered environments. The position, dimensions, and color of the face are used for the localization of the human body, construction of the models for the upper and lower body according to anthropometric constraints, and estimation of the skin color. Different levels of segmentation granularity are combined to extract the pose with highest potential. The segments that belong to the human body arise through the joint estimation of the foreground and background during the body part search phases, which alleviates the need for exact shape matching. The performance of our algorithm is measured using 40 images (43 persons) from the INRIA person dataset and 163 images from the "lab1" dataset, where the measured accuracies are 89.53% and 97.68%, respectively. Qualitative and quantitative experimental results demonstrate that our methodology outperforms state-of-the-art interactive and hybrid top-down/bottom-up approaches.Index Terms-Adaptive skin detection, anthropometric constraints, human body segmentation, multilevel image segmentation.
Image segmentation is one of the first important parts of image analysis and understanding. Evaluation of image segmentation, however, is a very difficult task, mainly because it requires human intervention and interpretation. In this work, we propose a blind reference evaluation scheme based on regional local–global (RLG) graphs, which aims at measuring the amount and distribution of detail in images produced by segmentation algorithms. The main idea derives from the field of image understanding, where image segmentation is often used as a tool for scene interpretation and object recognition. Evaluation here derives from summarization of the structural information content and not from the assessment of performance after comparisons with a golden standard. Results show measurements for segmented images acquired from three segmentation algorithms, applied on different types of images (human faces/bodies, natural environments and structures (buildings)).
Detection of hands in single, unconstrained, monocular images is a very difficult task. Localization and extraction of the hand regions, however, provides important and useful knowledge that can facilitate many other tasks, such as gesture recognition, pose estimation and action recognition. In this paper we present a simple appearancebased methodology that combines face detection and anthropometric constraints to efficiently estimate the position and regions of hands in images. It requires no training neither explicit estimation of the human pose. Experimental results illustrate the performance of the approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.