Semantic segmentation of urban street scenes has attracted much attention in the field of autonomous driving, which not only helps vehicles perceive the environment in real time, but also significantly improves the decision-making ability of autonomous driving systems. However, most of the current methods based on Convolutional Neural Network (CNN) mainly use coding the input image to a low resolution and then try to recover the high resolution, which leads to problems such as loss of spatial information, accumulation of errors, and difficulty in dealing with large-scale changes. To address these problems, in this paper, we propose a new semantic segmentation network (HRDLNet) for urban street scene images with high-resolution representation, which improves the accuracy of segmentation by always maintaining a high-resolution representation of the image. Specifically, we propose a feature extraction module (FHR) with high-resolution representation, which efficiently handles multi-scale targets and high-resolution image information by efficiently fusing high-resolution information and multi-scale features. Secondly, we design a multi-scale feature extraction enhancement (MFE) module, which significantly expands the sensory field of the network, thus enhancing the ability to capture correlations between image details and global contextual information. In addition, we introduce a dual-attention mechanism module (CSD), which dynamically adjusts the network to more accurately capture subtle features and rich semantic information in images. We trained and evaluated HRDLNet on the Cityscapes Dataset and the PASCAL VOC 2012 Augmented Dataset, and verified the model’s excellent performance in the field of urban streetscape image segmentation. The unique advantages of our proposed HRDLNet in the field of semantic segmentation of urban streetscapes are also verified by comparing it with the state-of-the-art methods.