Semantic segmentation is a fundamental computer vision task, and deep learning methods have been successfully applied to this field. However, target morphology continues to exhibit the incomplete prediction problem, which is attributable to the low feature utilisation and the insufficiency of spatial location information. This paper proposes a novel cross fusion network with unit attention mechanism (CF‐Net) for semantic segmentation. The two hallmarks of the framework are the usage of a multi‐scale fusion module and the unit attention mechanism. Multi‐scale fusion module can integrate multi‐branch outputs with different receptive fields, which obtain fine‐grained target details and visual contextual information. The cross fusion network is optimised with a unit attention mechanism to fuse intermediate features, which enables the acquisition of more accurate and effective spatial location information while maintaining consistency in feature space. The experimental results demonstrate that the proposed CF‐Net outperforms favourably comparable with other existing methods on the CamVid, Cityscapes, and PASCAL VOC 2012 databases, which also verifies the Effectiveness and reliability of our method.