Image segmentation is an important research in image processing and machine vision in which automated driving can be seen the main application scene of image segmentation algorithms. Due to the many constraints of power supply and communication in in-vehicle systems, the vast majority of current image segmentation algorithms are implemented based on the deep learning model. Despite the ultrahigh segmentation accuracy, the problem of mesh artifacts and segmentation being too severe is obvious, and the high cost, computational, and power consumption devices required are difficult to apply in real-world scenarios. It is the focus of this paper to construct a road scene segmentation model with simple structure and no need of large computing power under the premise of certain accuracy. In this paper, the ESPNet (Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation) model is introduced in detail. On this basis, an improved ESPNet model is proposed based on ESPNet. Firstly, the network structure of the ESPNet model is optimized, and then, the model is optimized by using a small amount of weakly labeled and unlabeled scene sample data. Finally, the new model is applied to video image segmentation based on dash cam. It is verified on Cityscape, PASCAL VOC 2012, and other datasets that the algorithm proposed in this paper is faster, and the amount of parameters required is less than 1% of other algorithms, so it is suitable for mobile terminals.