Achieving high detection accuracy of pavement cracks with complex textures under different lighting conditions is still challenging. In this context, an encoder-decoder network-based architecture named CrackResAttentionNet was proposed in this study, and the position attention module and channel attention module were connected after each encoder to summarize remote contextual information. The experiment results demonstrated that, compared with other popular models (ENet, ExFuse, FCN, LinkNet, SegNet, and UNet), for the public dataset, CrackResAttentionNet with BCE loss function and PRelu activation function achieved the best performance in terms of precision (89.40), mean IoU (71.51), recall (81.09), and F1 (85.04). Meanwhile, for a self-developed dataset (Yantai dataset), CrackResAttentionNet with BCE loss function and PRelu activation function also had better performance in terms of precision (96.17), mean IoU (83.69), recall (93.44), and F1 (94.79). In particular, for the public dataset, the precision of BCE loss and PRelu activation function was improved by 3.21. For the Yantai dataset, the results indicated that the precision was improved by 0.99, the mean IoU was increased by 0.74, the recall was increased by 1.1, and the F1 for BCE loss and PRelu activation function was increased by 1.24.