The multi vision metro tunnel defect sensing system mainly consists of IRT and RGB cameras, which can automatically identify and extract small tunnel lining surface defects, greatly improving detection efficiency. However, the presence of various issues like train vibration, inconsistent lighting, fluctuations in temperature and humidity leads to the images showing inadequate uniformity in illumination, blurriness, and a decrease in the level of detail. The above issues have led to unsatisfactory fusion processing results for multiple visual images and increased missed detection rates. A multi visual images fusion approach for metro tunnel defects based on saliency optimization of pixel level defect image features is proposed. This method first takes the motion state of the train and the blurry image as constraints to eliminate dynamic blurring in the image. Secondly, Image weights are allocated based on the uniformity of visible light image illumination in the tunnel, as well as real-time temperature and humidity. Finally, image feature extraction and fusion are performed by a U-Net network that integrates channel attention mechanisms. The experimental results demonstrate that this approach improves the image pixel value variation rate by 39.7%, enhances the edge quality by 23%, and outperforms similar approach in terms of average gradient, gradient quality, and sum of difference correlation with improvements of 15.9%, 7.3%, and 26.6% respectively.