Multi-person pose estimation (MPPE) has achieved impressive progress in recent years. However, due to the large variance of appearances among images or occlusions, the model can hardly learn consistent patterns enough, which leads to severe location jitter and missing issues. In this study, we propose a novel framework, termed Inter-image Contrastive consistency (ICON), to strengthen the keypoint consistency among images for MPPE. Concretely, we consider two-fold consistency constraints, which include single keypoint contrastive consistency (SKCC) and pair relation contrastive consistency (PRCC). The SKCC learns to strengthen the consistency of individual keypoints across images in the same category to improve the category-specific robustness. Only with SKCC, the model can effectively reduce location errors caused by large appearance variations, but remains challenging with extreme postures (e.g., occlusions) due to lack of relational guidance. Therefore, PRCC is proposed to strengthen the consistency of pair-wise joint relation between images to preserve the instructive relation. Cooperating with SKCC, PRCC further improves structure aware robustness in handling extreme postures. Extensive experiments on kinds of architectures across three datasets (i.e., MS-COCO, MPII, CrowdPose) show the proposed ICON achieves substantial improvements over baselines. Furthermore, ICON under the semi-supervised setup can obtain comparable results with the fully-supervised methods using only 30% labeled data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.