Semantic segmentation of remote sensing images (SSRSI), which aims to assign a category to each pixel in remote sensing images, plays a vital role in a broad range of applications, such as environmental monitoring, urban planning, and land resource utilization. Recently, with the successful application of deep learning in remote sensing, a substantial amount of work has been aimed at developing SSRSI methods using deep learning models. In this survey, we provide a comprehensive review of SSRSI. Firstly, we review the current mainstream semantic segmentation models based on deep learning. Next, we analyze the main challenges faced by SSRSI and comprehensively summarize the current research status of deep learning-based SSRSI, especially some new directions in SSRSI are outlined, including semi-supervised and weakly-supervised SSRSI, unsupervised domain adaption (UDA) in SSRSI, multi-modal data fusion-based SSRSI, and pretrained models for SSRSI. Then, we examine the most widely used datasets and metrics and review the quantitative results and experimental performance of some representative methods of SSRSI. At last, we discuss promising future research directions in this area.