Existing models still exhibit a deficiency in capturing more detailed contextual information when processing architectural images. This paper introduces a model for architectural image segmentation and retrieval based on an image segmentation network. Primarily, spatial attention is incorporated into the U-Net segmentation network to enhance the extraction of image features. Subsequently, a dual-path attention mechanism is integrated into the U-Net backbone network, facilitating the seamless integration of information across different spaces and scales. Experimental results showcase the superior performance of the proposed model on the test set, with average dice coefficient, accuracy, and recall reaching 94.67%, 95.61%, and 97.88%, respectively, outperforming comparative models. The proposed model can enhance the U-Net network's capability to identify targets within feature maps. The amalgamation of image segmentation networks and attention mechanisms in artificial intelligence technology enables precise segmentation and retrieval of architectural images.