Human activity recognition plays a central role in the development of intelligent systems for video surveillance, public security, health care and home monitoring, where detection and recognition of activities can improve the quality of life and security of humans. Typically, automated, intuitive and real-time systems are required to recognize human activities and identify accurately unusual behaviors in order to prevent dangerous situations. In this work, we explore the combination of three modalities (RGB, depth and skeleton data) to design a robust multi-modal framework for vision-based human activity recognition. Especially, spatial information, body shape/posture and temporal evolution of actions are highlighted using illustrative representations obtained from a combination of dynamic RGB images, dynamic depth images and skeleton data representations. Therefore, each video is represented with three images that summarize the ongoing action. Our framework takes advantage of transfer learning from pre-trained models to extract significant features from these newly created images. Next, we fuse extracted features using Canonical Correlation Analysis and train a Long Short-Term Memory network to classify actions from visual descriptive images. Experimental results demonstrated the reliability of our feature-fusion framework that allows us to capture highly significant features and enables us to achieve the state-of-the-art performance on the public UTD-MHAD and NTU RGB+D datasets.
Falling is a major health problem that causes thousands of deaths every year, according to the World Health Organization. Fall detection and fall prediction are both important tasks that should be performed efficiently to enable accurate medical assistance to vulnerable population whenever required. This allows local authorities to predict daily health care resources and reduce fall damages accordingly. We present in this paper a fall detection approach that explores human body geometry available at different frames of the video sequence. Especially, the angular information and the distance between the vector formed by the head -centroid of the identified facial image-and the center hip of the body, and the vector aligned with the horizontal axis of the center hip, are then used to construct distinctive image features. A two-class SVM classifier is trained on the newly constructed feature images, while a Long Short-Term Memory (LSTM) network is trained on the calculated angle and distance sequences to classify falls and non-falls activities. We perform experiments on the Le2i fall detection dataset and the UR FD dataset. The results demonstrate the effectiveness and efficiency of the developed approach.
According to the World Health Organization, falling of the elderly is a major health problem that causes many injuries and thousands of deaths every year. This increases pressure on health authorities to provide daily health care, reliable medical assistance, reduce fall damages and improve the elderly quality of life. For that, it is a priority to detect or predict falls accurately. In this paper, we present a fall detection approach based on human body geometry inferred from video sequence frames. We calculate the angular information between the vector formed by the head centroid of the identified facial image and the center hip of the body and the vector aligned with the horizontal axis of the center hip. Similarly, we calculate the distance between the vector formed by the head and the body center hip and the vector formed on its horizontal axis; we then construct distinctive image features. These angles and distances are then used to train a twoclass SVM classifier and a Long Short-Term Memory network (LSTM) on the calculated angle sequences to classify falls and nofalls activities. We perform experiments on the Le2i fall detection dataset. The results demonstrate the effectiveness and efficiency of the developed approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.