Non-obstructive azoospermia (NOA) is a severe form of male infertility characterized by impaired or absent sperm production in the testes. Microsurgical testicular sperm extraction (Micro TESE) is the primary treatment for NOA, but it faces challenges in differentiating between normal and abnormal seminiferous tubules based solely on morphology. To address this, our study employed stimulated Raman scattering (SRS) and second harmonic generation (SHG) microscopy to identify diagnostic features in human testicular tissues. Additionally, a deep learning-assisted diagnostic algorithm using multimodal imaging datasets demonstrated excellent performance in azoospermia diagnosis. Utilizing a weakly supervised Multiple Instance Learning-Convolutional Neuron Network (MIL-CNN) model framework, we achieved a 96% classification accuracy, surpassing the supervised CNN model. Gradient-weighted class activation mapping (Grad CAM) visualization confirmed the model’s focus on the spermatogenic region, demonstrating the potential of SRS/SHG microscopy coupled with deep learning to accurately classify normal and abnormal spermatogenic tubules, enhancing the efficiency and accuracy of pathological diagnosis.