Due to the improvement in computing power and the development of computer technology, deep learning has pene‐trated into various fields of the medical industry. Segmenting lesion areas in medical scans can help clinicians make accurate diagnoses. In particular, convolutional neural networks (CNNs) are a dominant tool in computer vision tasks. They can accurately locate and classify lesion areas. However, due to their inherent inductive bias, CNNs may lack an understanding of long‐term dependencies in medical images, leading to less accurate grasping of details in the images. To address this problem, we explored a Transformer‐based solution and studied its feasibility in medical imaging tasks (OstT). First, we performed super‐resolution reconstruction on the original MRI image of osteosarcoma and improved the texture features of the tissue structure to reduce the error caused by the unclear tissue structure in the image during model training. Then, we propose a Transformer‐based method for medical image segmentation. A gated axial attention model is used, which augments existing architectures by introducing an additional control mechanism in the self‐attention module to improve segmentation accuracy. Experiments on real datasets show that our method outper‐forms existing models such as Unet. It can effectively assist doctors in imaging examinations.