Multi-modal or multi-view dataset that was captured from various resources (e.g. RGB and Depth) of a subject at the same time. Combination between different cues has still faced to many challenges as unique data and complementary in-formation. In adition, the proposed method for multiple modalities recognition consists of discrete blocks, such as: extract features for separative data flows, combine of features, and classify gestures. To address the challenges, we pro-posed two novel end-to-end hand posture recognition frameworks, which are integrated all steps into a convolution neuronal network (CNN) system from capturing various types of cues (RGB and Depth images) to classify hand ges-ture labels. Both frameworks use the Resnet50 backbone that was pretrained by ImageNet dataset. We proposed a novel end-to-end multi-modal frameworks, which are named attention convolution module (ACM) and gated concatenation module (GCM). Both of them are deployed, evaluated and compared on vari-ous multi-modalities hand posture datasets. Experimental results show that our proposed method outperforms with others state-of-the-art techniques (SOTA) methods.
<p>In this study, we extensively analyze and evaluate the performance of recent deep neural networks (DNNs) for hand gesture recognition and static gestures in particular. To this end, we captured an unconstrained hand dataset with complex appearances, shapes, scales, backgrounds, and viewpoints. We then deployed some new trending convolution neuron networks (CNNs) for gesture classification. We arrived at three major conclusions: i) DenseNet121 architecture is the best recognition rate through almost evaluated red, green, blue (RGB) and augmentation datasets. Its performance is outstanding in most original works; ii) blender-based augmentation help to significantly increase 9% of accuracy, compared to the use of a RGB cues; iii) most CNNs can achieve impressive results at 97% accuracy when the training and testing datasets come from the same lab-based or constrained environment. Their performance is drastically reduced when dealing with gestures collected in unconstrained environments. In particular, we validated the best CNN on a new unconstrained dataset. We observed a significant reduction with an accuracy of only 74.55%. This performance can be improved up to 80.59% by strategies such as blender-based and/or GAN-based data augmentations to obtain an acceptable result of 83.17%. These findings contribute crucial factors and make fruitful recommendations for the development of a robust hand-based interface in practice</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.