Due to the efficiency of self-attention mechanisms in encoding spatial information, Transformer-based models have recently taken a dominant position among semantic segmentation methods. However, Transformer-based models have the disadvantages of requiring a large amount of computation and lacking attention to detail, so we look back to the CNN model. In this paper, we propose a multi-path semantic segmentation network with convolutional attention guidance (dubbed MCAG). It has a multi-path architecture, and feature guidance from the main path is used in other paths, which forces the model to focus on the object’s boundaries and details. It also explores multi-scale convolutional features through spatial attention. Finally, it captures both local and global contexts in spatial and channel dimensions in an adaptive manner. Extensive experiments were conducted on popular benchmarks, and it was found that MCAG surpasses other SOTA methods by achieving 47.7%, 82.51% and 43.6% mIoU on ADE20K, Cityscapes and COCO-Stuff, respectively. Specifically, the experimental results prove that the proposed model has high segmentation precision for small objects, which demonstrates the effectiveness of convolutional attention mechanisms and multi-path strategies. The results show that the CNN model can achieve good segmentation effects with a lower amount of calculation.