The precise semantic segmentation for moving targets has always been a challenging task in computer vision. The existing methods basically had some limitations in semantic segmentation, such as inability to handle deformation of moving targets, blurred boundaries, and other issues. To address the issue, this paper develops an improved U-Net model based on attention mechanism for this purpose. Firstly, we introduce an attention mechanism to enhance the perceptual ability of the U-Net model. By adding attention modules at different levels between the encoder and decoder, the network can pay more attention to the key features of moving targets at different levels. Then, we add a residual module to improve robustness and complete the capsule network for semantic segmentation of moving targets. By learning the deformation information of moving targets, the network can better adapt to moving targets with different shapes. We have conducted experimental verification on multiple public datasets. The experimental results show that the proposed method has superior performance in semantic segmentation tasks of moving targets. Compared with traditional U-Net-based models, the proposal shows significant improvements in accuracy and robustness.