Reducing traffic accident occurrences and enhancing road safety can be achieved through the processing of real-time surveillance image information for saliency object detection. Although existing saliency object detection methods based on real-time monitoring image information for intelligent driving have yielded certain results, there remain some shortcomings. In complex road environments, distinguishing between background and salient targets with existing methods proves difficult, resulting in false and missed detections. Consequently, this study investigates a saliency object detection method based on real-time monitoring image information for intelligent driving. The Visual Geometry Group (VGG) network discriminator in Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) is modified, and techniques such as spectral normalization (SN) are employed to improve the dynamic stability of training. Pixel-level image size amplification and feature enhancement are conducted on the salient objects in the dataset, providing a richer data foundation for subsequent real-time monitoring of saliency target detection and defect classification. The YOLOv5s algorithm is utilized as the identification network, and the original YOLOv5s backbone network is replaced with the MobileNetV2 network, significantly reducing network complexity and enhancing identification efficiency. The algorithm's performance in recognizing salient targets in real-time monitoring images for intelligent driving is further improved through network optimizer optimization and clustering algorithm adoption. The efficacy of the proposed method is substantiated by experimental results.