“…Beside body joints, keypoints can be extended to refer to the small visual units with semantic information indicating the compositions, shapes and poses of the target objects, such as finger joints or key positions of any other objects. Therefore, accurate keypoint detection in unconstrained environments brings benefit to other more detailed visual understanding tasks, including semantic segmentation[1, 2, 3], saliency object segmentation [4,5,6], hand segmentation [7] and pose estimation [8,9], viewpoint estimation [10,11,12,13], salient object detection [14,15,16], attention prediction [17] and 3D reconstruction [18,19,20]. Similar as many computer vision tasks, the progress on human pose estimation problem is significantly improved by deep convolutional neural networks.…”