Human pose estimation tasks often need to be deployed on edge devices. While existing humanpose estimation networks can achieve good accuracy, their complex network structure leads to slowinference speed, which is not suitable for real-time tasks. At the same time, the large model structureis not conducive to model deployment. To address this issue, increasingly lightweight networks havebeen proposed in recent years, but existing lightweight networks have a certain gap in model accuracycompared to traditional pose estimation networks. We enhance the feature extraction and keypointrefinement capabilities of the model by adding spatial attention mechanism and using the transformermodule Meta3D, which is more suitable for pose estimation tasks. We trained a lightweight networkCM-RTMpose on the COCO dataset, which achieved an AP value of 69.8%, surpassing most existinglightweight pose estimation networks. To demonstrate the effectiveness of our method, we conductedcorresponding ablation experiments on the COCO and OCHuman datasets. The experiments showedthat our method achieved an AP value of 66.1% on the OCHuman dataset, surpassing the baseline network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.