BackgroundCone beam computed tomography (CBCT) provides critical anatomical information for adaptive radiotherapy (ART), especially for tumors in the pelvic region that undergo significant deformation. However, CBCT suffers from inaccurate Hounsfield Unit (HU) values and lower soft tissue contrast. These issues affect the accuracy of pelvic treatment plans and implementation of the treatment, hence requiring correction.PurposeA novel stacked coarse‐to‐fine model combining Denoising Diffusion Probabilistic Model (DDPM) and spatial‐frequency domain convolution modules is proposed to enhance the imaging quality of CBCT images.MethodsThe enhancement of low‐quality CBCT images is divided into two stages. In the coarse stage, the improved DDPM with U‐ConvNeXt architecture is used to complete the denoising task of CBCT images. In the fine stage, the deep convolutional network model jointly constructed by fast Fourier and dilated convolution modules is used to further enhance the image quality in local details and global imaging. Finally, the accurate pseudo‐CT (pCT) images consistent with the size of the original data are obtained. Two hundred fifty paired CBCT‐CT images from cervical and rectal cancer, combined with 200 public dataset cases, were used collectively for training, validation, and testing.ResultsTo evaluate the anatomical consistency between pCT and real CT, we have used the mean(std) of structure similarity index measure (SSIM), peak signal to noise ratio (PSNR), and normalized cross‐correlation (NCC). The numerical results for the above three metrics comparing the pCT synthesized by the proposed model against real CT for cervical cancer cases were 87.14% (2.91%), 34.02 dB (1.35 dB), and 88.01% (1.82%), respectively. For rectal cancer cases, the corresponding results were 86.06% (2.70%), 33.50 dB (1.41 dB), and 87.44% (1.95%). The paired t‐test analysis between the proposed model and the comparative models (ResUnet, CycleGAN, DDPM, and DDIM) for these metrics revealed statistically significant differences (p < 0.05). The visual results also showed that the anatomical structures between the real CT and the pCT synthesized by the proposed model were closer. For the dosimetric verification, mean absolute error of dosimetry (MAEdoes) values for the maximum dose (Dmax), the minimum dose (Dmin), and the mean dose (Dmean) in the planning target volume (PTV) were analyzed, with results presented as mean (lower quartile, upper quartile). The experimental results show that the values of the above three dosimetry indexes (Dmin, Dmax, and Dmean) for the pCT images synthesized by the proposed model were 0.90% (0.48%, 1.29%), 0.82% (0.47%, 1.17%), and 0.57% (0.44%, 0.67%). Compared with 10 cases of the original CBCT image by Mann–Whitney test (p < 0.05), it also proved that pCT can significantly improve the accuracy of HU values for the dose calculation.ConclusionThe pCT synthesized by the proposed model outperforms the comparative models in numerical accuracy and visualization, promising for ART of pelvic cancers.