Robust motion representations for action recognition have achieved remarkable performance in both controlled and 'in-the-wild' scenarios. Such representations are primarily assessed for their ability to label a sequence according to some predefined action classes (e.g. walk, wave, open). Although increasingly accurate, these classifiers are likely to label a sequence, even if the action has not been fully completed, because the motion observed is similar enough to the training set. Consider the case where one attempts to drink but realises the beverage is too hot. A drinking-vs-all classifier is likely to recognise this action as drinking regardless. We introduce the term action completion as a step beyond the task of action recognition. It aims to recognise whether the action's goal has been successfully achieved. The notion of completion differs per action and could be infeasible to verify using a visual sensor, however, for many actions, an observer would be able to make the distinction by noticing subtle differences in motion.We address incompletion in a supervised approach, using a new dataset that contains 414 complete as well as incomplete sequences, captured using a depth sensor, spanning 6 actions (switch, plug, open, pull, pick and drink). For each action, we varied the conditions so the action cannot be completed. For example, for plug, subjects were given a plug that does not match the socket, while for pull, a drawer was locked so could not be pulled, and similarly for the other actions. Given labelled complete and incomplete sequences of the same action, we build a model of completion of that action as a binary classifier for each of our actions.Since the notion of completion differs per action, a general action completion method should investigate the performance of different types of features to accommodate the various action classes. For example, for the action pick, the difference between complete and incomplete actions originates from the subtle change in body pose when holding an object, or by observing an object in the hand. On the other hand, for the action drink, the speed at which the action is performed is better able to assess the completion. We propose a method that chooses the feature(s) suitable for recognising completion from a pool of depth features using 'leave-oneperson-out' cross validation on the training set and automatically selecting the most discriminative feature(s). Figure 1: For a complete drink (green) and an incomplete drink (blue) sequences from our dataset, both are classified as drink when using drink vs. plug classifier (a). The proposed supervised action completion model (b) identifies the incomplete sequence.
Parkinson’s disease (PD) is a chronic neurodegenerative condition that affects a patient’s everyday life. Authors have proposed that a machine learning and sensor-based approach that continuously monitors patients in naturalistic settings can provide constant evaluation of PD and objectively analyse its progression. In this paper, we make progress toward such PD evaluation by presenting a multimodal deep learning approach for discriminating between people with PD and without PD. Specifically, our proposed architecture, named MCPD-Net, uses two data modalities, acquired from vision and accelerometer sensors in a home environment to train variational autoencoder (VAE) models. These are modality-specific VAEs that predict effective representations of human movements to be fused and given to a classification module. During our end-to-end training, we minimise the difference between the latent spaces corresponding to the two data modalities. This makes our method capable of dealing with missing modalities during inference. We show that our proposed multimodal method outperforms unimodal and other multimodal approaches by an average increase in F1-score of 0.25 and 0.09, respectively, on a data set with real patients. We also show that our method still outperforms other approaches by an average increase in F1-score of 0.17 when a modality is missing during inference, demonstrating the benefit of training on multiple modalities.
IntroductionThe impact of disease-modifying agents on disease progression in Parkinson’s disease is largely assessed in clinical trials using clinical rating scales. These scales have drawbacks in terms of their ability to capture the fluctuating nature of symptoms while living in a naturalistic environment. The SPHERE (Sensor Platform for HEalthcare in a Residential Environment) project has designed a multi-sensor platform with multimodal devices designed to allow continuous, relatively inexpensive, unobtrusive sensing of motor, non-motor and activities of daily living metrics in a home or a home-like environment. The aim of this study is to evaluate how the SPHERE technology can measure aspects of Parkinson’s disease.Methods and analysisThis is a small-scale feasibility and acceptability study during which 12 pairs of participants (comprising a person with Parkinson’s and a healthy control participant) will stay and live freely for 5 days in a home-like environment embedded with SPHERE technology including environmental, appliance monitoring, wrist-worn accelerometry and camera sensors. These data will be collected alongside clinical rating scales, participant diary entries and expert clinician annotations of colour video images. Machine learning will be used to look for a signal to discriminate between Parkinson’s disease and control, and between Parkinson’s disease symptoms ‘on’ and ‘off’ medications. Additional outcome measures including bradykinesia, activity level, sleep parameters and some activities of daily living will be explored. Acceptability of the technology will be evaluated qualitatively using semi-structured interviews.Ethics and disseminationEthical approval has been given to commence this study; the results will be disseminated as widely as appropriate.
Monitoring the progression of an action towards completion offers fine grained insight into the actor's behaviour. In this work, we target detecting the completion moment of actions, that is the moment when the action's goal has been successfully accomplished. This has potential applications from surveillance to assistive living and human-robot interactions. Previous effort [14] required human annotations of the completion moment for training (i.e. full supervision). In this work, we present an approach for moment detection from weak video-level labels. Given both complete and incomplete sequences, of the same action, we learn temporal attention, along with accumulated completion prediction from all frames in the sequence. We also demonstrate how the approach can be used when completion moment supervision is available. We evaluate and compare our approach on actions from three datasets, namely HMDB, UCF101 and RGBD-AC, and show that temporal attention improves detection in both weakly-supervised and fully-supervised settings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.