Various statistical data indicate that mobile source pollutants have become a significant contributor to atmospheric environmental pollution, with vehicle tailpipe emissions being the primary contributor to these mobile source pollutants. The motion shadow generated by motor vehicles bears a visual resemblance to emitted black smoke, making this study primarily focused on the interference of motion shadows in the detection of black smoke vehicles. Initially, the YOLOv5s model is used to locate moving objects, including motor vehicles, motion shadows, and black smoke emissions. The extracted images of these moving objects are then processed using simple linear iterative clustering to obtain superpixel images of the three categories for model training. Finally, these superpixel images are fed into a lightweight MobileNetv3 network to build a black smoke vehicle detection model for recognition and classification. This study breaks away from the traditional approach of “detection first, then removal” to overcome shadow interference and instead employs a “segmentation-classification” approach, ingeniously addressing the coexistence of motion shadows and black smoke emissions. Experimental results show that the Y-MobileNetv3 model, which takes motion shadows into account, achieves an accuracy rate of 95.17%, a 4.73% improvement compared with the N-MobileNetv3 model (which does not consider motion shadows). Moreover, the average single-image inference time is only 7.3 ms. The superpixel segmentation algorithm effectively clusters similar pixels, facilitating the detection of trace amounts of black smoke emissions from motor vehicles. The Y-MobileNetv3 model not only improves the accuracy of black smoke vehicle recognition but also meets the real-time detection requirements.