In view of the low segmentation accuracy for small-scale object and insufficient segmentation of local boundary for semantic segmentation methods based on Deep Learning, this paper proposes an image semantic segmentation approach based on attention mechanism and feature fusion. On the basis of ensuring the overall accuracy, the segmentation accuracy of small-scale object and local boundary is improved, and it meets the requirement of accurately segmenting the object in the complex background. Firstly, an image semantic segmentation model based on hybrid cascade and feature fusion is proposed, and the hybrid concatenation and multi-cores pooling methods are used to extract deeper semantic information. Then, a cross-stages fusion approach is designed to divide the backbone network of the encoder stage in the network and the improved Atrous Spatial Pyramid Pooling module into three stages to fully utilize the different semantic information of the shallow and deep layers. Thirdly, the attention mechanism is introduced into the hybrid cascade and feature fusion image semantic segmentation network model, and image semantic segmentation model based on cross-stages and attention mechanisms is explained. Self attention is added to channel attention enhances the connection between feature maps, and one-dimensional convolution in the spatial attention mechanism is used to increase the spatial receptive field. The final results on the public dataset PASCAL VOC2012 and SUIM show that MIoUs have reached 86.68% and 61.55% respectively, and it proved that the overall accuracy of the approach proposed in this paper is higher than other ones.INDEX TERMS Semantic segmentation, hybrid cascade feature fusion, detailed attention mechanism, deep learning.