Face parsing, the segmentation of facial components at the pixel level, is pivotal for comprehensive facial analysis. However, previous studies encountered challenges, showing reduced performance in areas with small or thin classes like necklaces and earrings, and struggling to adapt to occlusion scenarios such as masks, glasses, caps or hands. To address these issues, this study proposes a robust face parsing technique through the strategic integration of self-attention and self-distillation methods. The self-attention module enhances contextual information, enabling precise feature identification for each facial element. Multi-task learning for edge detection, coupled with a specialized loss function focusing on edge regions, elevates the understanding of fine structures and contours. Additionally, the application of self-distillation for fine-tuning proves highly efficient, producing refined parsing results while maintaining high performance in scenarios with limited labels and ensuring robust generalization. The integration of self-attention and self-distillation techniques addresses challenges of previous studies, particularly in handling small or thin classes. This strategic fusion enhances overall performance, achieving computational efficiency, and aligns with the latest trends in this research area. The proposed approach attains a Mean F1 score of 88.18% on the CelebAMask-HQ dataset, marking a significant advancement in face parsing with state-of-the-art performance. Even in challenging occlusion areas like hands and masks, it demonstrates a remarkable F1 score of over 99%, showcasing robust face parsing capabilities in real-world environments.