Human key-point detection is a challenging research field in computer vision. Convolutional neural models limit the number of parameters and mine the local structure, and have made great progress in significant target detection and key-point detection. However, the features extracted by shallow layers mainly contain a lack of semantic information, while the features extracted by deep layers contain rich semantic information but a lack of spatial information that results in information imbalance and feature extraction imbalance. With the complexity of the network structure and the increasing amount of computation, the balance between the time of communication and the time of calculation highlights the importance. Based on the improvement of hardware equipment, network operation time is greatly improved by optimizing the network structure and data operation methods. However, as the network structure becomes deeper and deeper, the communication consumption between networks also increases, and network computing capacity is optimized. In addition, communication overhead is also the focus of recent attention. We propose a novel network structure PGNet, which contains three parts: pipeline guidance strategy (PGS); Cross-Distance-IoU Loss (CIoU); and Cascaded Fusion Feature Model (CFFM).