In recent years, the ability to accurately predict age and gender from facial images has gained significant traction across various fields such as personalized marketing, human–computer interaction, and security surveillance. However, the high computational cost of the current models limits their practicality for real-time applications on resource-constrained devices. This study addressed this challenge by leveraging knowledge distillation to develop lightweight age and gender prediction models that maintain a high accuracy. We propose a knowledge distillation method using teacher bounds for the efficient learning of small models for age and gender. This method allows the student model to selectively receive the teacher model’s knowledge, preventing it from unconditionally learning from the teacher in challenging age/gender prediction tasks involving factors like illusions and makeup. Our experiments used MobileNetV3 and EfficientFormer as the student models and Vision Outlooker (VOLO)-D1 as the teacher model, resulting in substantial efficiency improvements. MobileNetV3-Small, one of the student models we experimented with, achieved a 94.27% reduction in parameters and a 99.17% reduction in Giga Floating Point Operations per Second (GFLOPs). Furthermore, the distilled MobileNetV3-Small model improved gender prediction accuracy from 88.11% to 90.78%. Our findings confirm that knowledge distillation can effectively enhance model performance across diverse demographic groups while ensuring efficiency for deployment on embedded devices. This research advances the development of practical, high-performance AI applications in resource-limited environments.