In computer vision, human parsing is challenging due to its demand for accurate human region location and semantic partitioning. This dense prediction task needs powerful computation and high‐precision models. To enable real‐time parsing on resource‐limited devices, the authors introduced a lightweight model using ResNet18 as a core network . The authors simplified the pyramid module, improving context clarity and reducing complexity. The authors integrated a spatial attention fusion strategy to counter precision loss in the light‐weighting process. Traditional models, despite their segmentation precision, are limited by their computational complexity and extensive parameters. The authors implemented knowledge distillation (KD) techniques to enhance the authors’ lightweight network's accuracy. Traditional methods can fail to learn useful knowledge with significant network differences. Hence, the authors used a novel distillation approach based on inter‐class and intra‐class relations in prediction outcomes, noticeably improving parsing accuracy. The authors’ experiments on the Look into Person (LIP) dataset show that their lightweight model significantly reduces parameters while maintaining parsing precision and enhancing inference speed.
The objective of human parsing is to partition a human in an image into constituent parts. This task involves labeling each pixel of the human image according to the classes. Since the human body comprises hierarchically structured parts, each body part of an image can have its sole position distribution characteristics. Probably, a human head is less likely to be under the feet, and arms are more likely to be near the torso. Inspired by this observation, we make instance class distributions by accumulating the original human parsing label in the horizontal and vertical directions, which can be utilized as supervision signals. Using these horizontal and vertical class distribution labels, the network is guided to exploit the intrinsic position distribution of each class. We combine two guided features to form a spatial guidance map, which is then superimposed onto the baseline network by multiplication and concatenation to distinguish the human parts precisely. We conducted extensive experiments to demonstrate the effectiveness and superiority of our method on three well-known benchmarks: LIP, ATR, and CIHP databases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.