A thorough understanding of the impact of the 3D meso-structure on damage and failure patterns is essential for revealing the failure conditions of composite rock materials such as coal, concrete, marble, and others. This paper presents a 3D XCT dataset of coal rock with 1372 slices (each slice contains 1720 × 1771 pixels in x × y direction). The 3D XCT datasets were obtained by MicroXMT-400 using the 225/320kv Nikon Metris custom bay. The raw datasets were processed by an automatic semantic segmentation method based on the Swin Transformer (Swin-T) architecture, which aims to overcome the issue of large errors and low efficiency for traditional methods. The hybrid loss function proposed can also effectively mitigate the influence of large volume features in the training process by incorporating modulation terms into the cross entropy loss, thereby enhancing the accuracy of segmentation for small volume features. This dataset will be available to the related researchers for further finite element analysis or microstructural statistical analysis, involving complex physical and mechanical behaviors at different scales.