Accurate segmentation of the left atrial structure using magnetic resonance images provides an important basis for the diagnosis of atrial fibrillation (AF) and its treatment using robotic surgery. In this study, an image segmentation method based on sequence relationship learning and multi-scale feature fusion is proposed for 3D to 2D sequence conversion in cardiac magnetic resonance images and the varying scales of left atrial structures within different slices. Firstly, a convolutional neural network layer with an attention module was designed to extract and fuse contextual information at different scales in the image, to strengthen the target features using the correlation between features in different regions within the image, and to improve the network’s ability to distinguish the left atrial structure. Secondly, a recurrent neural network layer oriented to two-dimensional images was designed to capture the correlation of left atrial structures in adjacent slices by simulating the continuous relationship between sequential image slices. Finally, a combined loss function was constructed to reduce the effect of positive and negative sample imbalance and improve model stability. The Dice, IoU, and Hausdorff distance values reached 90.73%, 89.37%, and 4.803 mm, respectively, based on the LASC2013 (left atrial segmentation challenge in 2013) dataset; the corresponding values reached 92.05%, 89.41% and 9.056 mm, respectively, based on the ASC2018 (atrial segmentation challenge at 2018) dataset.