Although the Mask region-based convolutional neural network (R-CNN) model possessed a dominant position for complex and variable road scene segmentation, some problems still existed, including insufficient feature expressive ability and low segmentation accuracy. To address these problems, a novel road scene segmentation algorithm based on the modified Mask R-CNN was proposed. The multi-scale backbone network, Res2Net, was utilized to replace the ResNet network, and aimed to improve the feature extraction capability. The soft non-maximum suppression algorithm with attenuation function (soft-NMS) was adopted to improve detection efficiency in the case of a higher overlap rate. The comparison analyses of partition accuracy for various models were performed on the adopted Cityscapes dataset. The results demonstrated that the modified Mask R-CNN effectively increased the segmentation accuracy, especially for small and highly overlapping objects. The adopted Res2Net and soft-NMS can effectively enhance the feature extraction and improve segmentation performance. The average accuracy of the modified Mask R-CNN model reached up to 0.321, and was 0.054 higher than Mask R-CNN. This work provides important guidance to design a more efficient road scene instance segmentation algorithm for further promoting the actual application in automatic driving systems.