Smart meters allow the grid to interface with individual buildings and extract detailed consumption information using nonintrusive load monitoring (NILM) algorithms applied to the acquired data. Deep neural networks, which represent the state of the art for NILM, are affected by scalability issues since they require high computational and memory resources, and by reduced performance when training and target domains mismatched. This article proposes a knowledge distillation approach for NILM, in particular for multilabel appliance classification, to reduce model complexity and improve generalization on unseen data domains. The approach uses weak supervision to reduce labeling effort, which is useful in practical scenarios. Experiments, conducted on U.K.-DALE and RE-FIT datasets, demonstrated that a low-complexity network can be obtained for deployment on edge devices while maintaining high performance on unseen data domains. The proposed approach outperformed benchmark methods in unseen target domains achieving a F 1 -score 0.14 higher than a benchmark model 78 times more complex.