Image semantic segmentation is an important branch of computer vision of a wide variety of practical applications such as medical image analysis, autonomous driving, virtual or augmented reality, etc. In recent years, due to the remarkable performance of transformer and multilayer perceptron (MLP) in computer vision, which is equivalent to convolutional neural network (CNN), there has been a substantial amount of image semantic segmentation works aimed at developing different types of deep learning architecture. This survey aims to provide a comprehensive overview of deep learning methods in the field of general image semantic segmentation. Firstly, the commonly used image segmentation datasets are listed. Next, extensive pioneering works are deeply studied from multiple perspectives (e.g., network structures, feature fusion methods, attention mechanisms), and are divided into four categories according to different network architectures: CNN-based architectures, transformer-based architectures, MLP-based architectures, and others. Furthermore, this paper presents some common evaluation metrics and compares the respective advantages and limitations of popular techniques both in terms of architectural design and their experimental value on the most widely used datasets. Finally, possible future research directions and challenges are discussed for the reference of other researchers.