Hand pose estimation is the basis of dynamic gesture recognition. In vision-based hand pose estimation, the performance of hand pose estimation is affected due to the high flexibility of hand joints, local similarity and severe occlusion among hand joints. In this paper, the structural relations between hand joints are established, and the improved nonparametric structure regularization machine (NSRM) is used to achieve more accurate estimation of hand pose. Based on the NSRM network, the backbone network is replaced by the new high-resolution net proposed in this paper to improve the network performance, and then the number of parameters is decreased by reducing the input and output channels of some convolutional layers. The experiment of hand pose estimation is carried out by using public dataset, the experimental results show that the improved NSRM network has higher accuracy and faster inference speed for hand pose estimation.
Hand pose estimation is the basis of dynamic gesture recognition. In vision-based hand pose estimation, the joints of the human hand are highly flexible, and problems such as local similarity and severe occlusion have great influence on the estimation of hand posture. In order to identify the complicated hand posture, the structural relationship between the hand nodes is established, more accurate hand pose estimation can be achieved through the improved Nonparametric Structure Regularization Machine (NSRM) in this paper. Based on the NSRM network, the backbone network is replaced by New High-Resolution Net (NHRNet), then the input and output channels of some convolutional layers are reduced. Finally, a public dataset is used to conduct the hand pose estimation experiments. The experimental results show that the optimized NSRM network has higher accuracy and faster recognition speed for hand pose estimation.
No abstract
Human pose estimation is an important research direction in the field of computer vision, and transformer-based pose estimation algorithms have been favored for their excellent performance and low parametric number. Nonetheless, the algorithms suffer from computational complexity and insensitivity to local details. To address these problems, the transpose model introduces the twin attention module to improve the model efficiency and reduce resource consumption. Additionally, to solve the drawback of insufficiently high-quality joint feature representation resulting in poor network recognition, the intra-level feature fusion module V block was used to replace the basic block in the third subnet of the CNN backbone in the TransPose model. Then, the improved TransPose pose estimation network named VTTransPose was set up. The VTTranspose network achieves AP evaluation index scores of 76.5 and 73.6 on COCO val2017 and COCO test-dev2017, which shows an improvement of 0.4 and 0.2 compared to the original TransPose network. Moreover, the FLOPs of VTTransPose are reduced by 4.8G, the number of parameters is decreased by 2M, and the memory usage during training is reduced by about 40%. All the experimental results demonstrate that the proposed VTTransPose is more accurate, efficient, and lightweight compared with the original TransPose model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.