The World Health Organization (WHO) has stated that the spread of the coronavirus (COVID-19) is on a global scale and that wearing a face mask at work is the only effective way to avoid becoming infected with the virus. The pandemic made governments worldwide stay under lock-downs to prevent virus transmissions. Reports show that wearing face masks would reduce the risk of transmission. With the rise in population in cities, there is a greater need for efficient city management in today’s world for reducing the impact of COVID-19 disease. For smart cities to prosper, significant improvements to occur in public transportation, roads, businesses, houses, city streets, and other facets of city life will have to be developed. The current public bus transportation system, such as it is, should be expanded with artificial intelligence. The autonomous mask detection and alert system are needed to find whether the person is wearing a face mask or not. This article presents a novel IoT-based face mask detection system in public transportation, especially buses. This system would collect real-time data via facial recognition. The main objective of the paper is to detect the presence of face masks in real-time video stream by utilizing deep learning, machine learning, and image processing techniques. To achieve this objective, a hybrid deep and machine learning model was designed and implemented. The model was evaluated using a new dataset in addition to public datasets. The results showed that the transformation of Convolution Neural Network (CNN) classifier has better performance over the Deep Neural Network (DNN) classifier; it has almost complete face-identification capabilities with respect to people’s presence in the case where they are wearing masks, with an error rate of only 1.1%. Overall, compared with the standard models, AlexNet, Mobinet, and You Only Look Once (YOLO), the proposed model showed a better performance. Moreover, the experiments showed that the proposed model can detect faces and masks accurately with low inference time and memory, thus meeting the IoT limited resources.