For the past several years, semantic segmentation method based on deep learning, especially Unet, have achieved tremendous success in medical image processing. The U-shaped topology of Unet can well solve image segmentation tasks. However, due to the limitation of traditional convolution operations, Unet cannot realize global semantic information interaction. To address this problem, this paper proposes RT-Unet, which combines the advantages of Transformer and Residual network for accurate medical segmentation. In RT-Unet, the Residual block is taken as the image feature extraction layer to alleviate the problem of gradient degradation and obtain more effective features. Meanwhile, Skip-Transformer is proposed, which takes Multi-head Self-Attention as the main algorithm framework, instead of the original Skip-Connection layer in Unet to avoid the influence of shallow features on the network's performance. Besides, we add attention module at the decoder to reduce semantic differences. According to the experiments on MoNuSeg data set and ISBI_2018cell data set, RT-Unet achieves better segmentation performance than other deep learning-based algorithms. In addition, a series of further ablation experiments were conducted on Residual network and