Robust recognition of hand gestures in real-world applications is still an unaccomplished goal due to many remaining challenges, such as cluttered backgrounds and unconstrained environmental factors. In most existing methods in this field, hand segmentation is a critical step to reduce redundant information in the scene background, in preparation for the hand gesture recognition stage. Therefore, we propose a new two-stage convolutional neural network (CNN) architecture, called HGR-Net, where the first stage performs accurate pixel-level semantic segmentation to determine the hand regions, and the second stage identifies the hand gesture. The segmentation stage architecture is based on the combination of fully convolutional residual network and atrous spatial pyramid pooling. Although the segmentation sub-network is trained without depth information, it is significantly robust against challenges such as illumination variations and complex backgrounds. The recognition stage deploys a two-stream CNN which fuses the information from the RGB and segmented images by combining their deep representations in a new fully connected layer before classification. Extensive experiments on public hand gesture datasets show that our deep architecture achieves almost as good as state-of-the-art performance in segmentation and recognition of static hand gestures, at a fraction of training time, run time, and model size.
Evaluating neurological disorders such as Parkinson's disease (PD) is a challenging task that requires the assessment of several motor and non-motor functions. In this paper, we present an end-to-end deep learning framework to measure PD severity in two important components, hand movement and gait, of the Unified Parkinson's Disease Rating Scale (UPDRS). Our method leverages on an Inflated 3D CNN trained by a temporal segment framework to learn spatial and long temporal structure in video data. We also deploy a temporal attention mechanism to boost the performance of our model. Further, motion boundaries are explored as an extra input modality to assist in obfuscating the effects of camera motion for better movement assessment. We ablate the effects of different data modalities on the accuracy of the proposed network and compare with other popular architectures. We evaluate our proposed method on a dataset of 25 PD patients, obtaining 72.3% and 77.1% top-1 accuracy on hand movement and gait tasks respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.