Despite their success, existing human pose estimation approaches mostly have complex architectures, high cost, and lack of lightweight modules. To address this problem, this paper proposes a Ghost Shuffle Lightweight Pose Network (GSLPN) with a more lightweight and efficient network architecture than the popular Lightweight Pose Network. First, in order to condense the scale of the network while maintaining its performance, we stack two lightweight modules, depthwise convolution and the Ghost module, to build our initial prototype bottleneck. Then, we impose a channel shuffle operation on the prototype bottleneck to shuffle the sequence of the feature maps for constructing Ghost Shuffle Bottleneck (Ghost Shuffle Bottleneck) with effective feature representation so as to develop a GSLPN. Second, a lightweight, efficient parallel attention mechanism, Lightweight Pose Parallel Attention, is proposed to improve keypoint locating accuracy. An experiment validating the proposed method showed that GSLPN achieved competitive performance with a smaller model size and less computational complexity than state-of-the-art methods, indicating that the GSLPN is a superior approach for human pose estimation.
K E Y W O R D Sghost shuffle lightweight module, human pose estimation, lightweight pose parallel attention
| INTRODUCTIONThe task of two-dimensional (2-D) human pose estimation is to localise human anatomical keypoints (e.g., elbow, wrist) or body parts from an image, which is fundamental to a variety of vision applications, including human action recognition [1-3], kinematics analysis [4], human-computer interaction [5,6], and animation. Owing to the complex background environment, nonrigid properties, and occlusions of the human body, human pose estimation is a challenging but important problem. Many models, such as Pictorial Structure (PS)-based models [7,8], probabilistic graphical based models [9], and deep convolutional neural network (DCNN) models [10][11][12][13][14][15][16][17][18][19][20][21][22][23], have been proposed to target these difficulties in achieving accurate and robust pose estimation, among which the DCNN-like methods have made good progress under different natural environments.However, most existing approaches concentrate on the improvement of human pose estimation accuracy and therefore establish a complex network for seeking high performance, leading to numerous parameters and a large number of floatingpoint operations (FLOPs). The problems of such models are twofold: (1) numerous parameters lead to a high memory cost, and (2) too many matrix multiplications lead to high computation costs during the process of network inference. Therefore, high-This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.