Retinal vessel segmentation in fundus images plays an essential role in the screening, diagnosis, and treatment of many diseases. The acquired fundus images generally have the following problems: uneven illumination, high noise, and complex structure. It makes vessel segmentation very challenging. Previous methods of retinal vascular segmentation mainly use convolutional neural networks on U Network (U-Net) models, and they have many limitations and shortcomings, such as the loss of microvascular details at the end of the vessels. We address the limitations of convolution by introducing the transformer into retinal vessel segmentation. Therefore, we propose a hybrid method for retinal vessel segmentation based on modulated deformable convolution and the transformer, named DT-Net. Firstly, multi-scale image features are extracted by deformable convolution and multi-head selfattention (MHSA). Secondly, image information is recovered, and vessel morphology is refined by the proposed transformer decoder block. Finally, the local prediction results are obtained by the side output layer. The accuracy of the vessel segmentation is improved by the hybrid loss function. Experimental results show that our method obtains good segmentation performance on Specificity (SP), Sensitivity (SE), Accuracy (ACC), Curve (AUC), and F1-score on three publicly available fundus datasets such as DRIVE, STARE, and CHASE_DB1.