Global contextual information and local information are crucial for the segmentation of coal rock fractures. For global information, most existing schemes directly input the feature maps output by CNN into Transformer for extraction. Due to the limitation of feature map size, Transformer is unable to further extract multi-scale information of cracks, making it unsuitable for cracks of different sizes; Moreover, the feature map contains data with low correlation with cracks, and directly inputting it into Transformer will affect the extraction of global information about cracks.Unet based networks are one of the mainstream solutions for extracting local detail information in images, but the low-level features of their skip connections contain redundant information, which directly leads to false positive segmentation results.Based on the above issues, this article designs the Channel Attention Atrus Spatial Pyramid Pooling module (TD-ASPP) and the Cross Attention Non-Local Block (CANB), and integrates them into Transunet to propose a coal rock fracture image segmentation based on attention mechanism and multi-scale features (MS-Unet).Firstly, the TD-ASPP module is used to capture multi-scale features of cracks, while strengthening channels with strong correlation with cracks and diluting channels with weak correlation, in order to further improve the accuracy of the transformer in calculating global information of cracks. Secondly, the CANB module can utilize advanced features to supervise and strengthen low-level features, further enhancing local details of cracks and suppressing false positives.