Automatic medical image segmentation has shown great potential in recent years. Howerver, magnetic resonance images (MRI) usually have the characteristics of noise and artifacts, existing methods cannot accurately segment the boundaries. In addition, most existing algorithms are unable to effectively capture the global dependencies to offset the local inductive bias. In this work, we present a novel denoising diffusion model based on 3D Swin Transformer for brain tumor segmentation, called DiffSwinTr. Furthermore, a conditional encoder module is designed to extract multi‐scale features and enhance the ability of local feature perception. We extensively evaluate our model on three BraTS datasets. The proposed DiffSwinTr can achieve average Dice Scores of 79.91%, 83.07%, and 85.38%, as well as the average Hausdorff Distances (95%) of 3.361, 3.334, and 2.975mm, respectively. The experimental results illuminate that DiffSwinTr outperforms the state‐of‐the‐art segmentation methods. Besides, our proposed model is robust to segmenting images with noise and artifacts.