Despite recent advances in deep learning, the rise of edge devices, and the exponential growth of Internet of Things (IoT) connected devices undermine the performance of deep learning models. It is clear that the future of computing is moving to edge devices. Autonomous vehicles and self-driving cars have leveraged the power of computer vision, especially object detection to navigate through traffic safely. Nevertheless, to be able to drive on all types of roads, these new vehicles have to be equipped with a road anomaly detection system, which strikes the need for small deep learning models of detecting road anomalies that can be deployed on these vehicles for safer driving experience. However, the current deep learning models are not practical on embedded devices due to the heavy resource requirements of the models, as such cannot be deployed on embedded devices. This paper proposes a theoretical approach to building a lightweight model from a cumbersome pothole detection model that is suitable on edge devices using knowledge distillation. It presents the theoretical approach of knowledge distillation, why it is a better technique of model compression compared to the rest. It shows that a cumbersome model can be made lightweight without sacrificing accuracy and with a reduced time complexity and faster training time.