Objective: In this work, we aim to propose an accurate and robust spectrum estimation method by synergistically combining X-ray imaging physics with a convolutional neural network (CNN). 
Approach: The approach relies on transmission measurements, and the estimated spectrum is formulated as a convolutional summation of a few model spectra generated using Monte Carlo simulation. The difference between the actual and estimated projections is utilized as the loss function to train the network. We contrasted this approach with the weighted sums of model spectra approach previously proposed. Comprehensive studies were performed to demonstrate the robustness and accuracy of the proposed approach in various scenarios. 
Main results: The results show the desirable accuracy of the CNN-based method for spectrum estimation. The ME and NRMSE were -0.021 keV and 3.04% for 80kVp, and 0.006 keV and 4.44% for 100kVp, superior to the previous approach. The robustness test and experimental study also demonstrated superior performances. The CNN-based approach yielded remarkably consistent results in phantoms with various material combinations, and the CNN-based approach was robust concerning spectrum generators and calibration phantoms. 
Significance: We proposed a method for estimating the real spectrum by integrating a deep learning model with real imaging physics. The results demonstrated that this method was accurate and robust in estimating the spectrum, and it is potentially helpful for broad X-ray imaging tasks.