In this study, a deep convolutional neural network based on an improved You only look once version 3 (YOLOv3) is proposed to improve the accuracy and real-time detection of small targets in complex backgrounds when detecting leaky weld studs on an automotive workpiece. To predict stud locations, the prediction layer of the model increases from three layers to four layers. An image pyramid structure obtains stud feature maps at different scales, and shallow feature fusion at multiple scales obtains stud contour details. Focal loss is added to the loss function to solve the imbalanced sample problem. The reduced weight of simple background classes allows the algorithm to focus on foreground classes, reducing the number of missed weld studs. Moreover, K-medians algorithm replaces the original K-means clustering to improve model robustness. Finally, an image dataset of car body workpiece studs is built for model training and testing. The results reveal that the average detection accuracy of the improved YOLOv3 model is 80.42%, which is higher than the results of Faster R-CNN, single-shot multi-box detector (SSD), and YOLOv3. The detection time per image is just 0.32 s (62.8% and 23.8% faster than SSD and Faster R-CNN, respectively), fulfilling the requirement for stud leakage detection in real-world working environments.