Our primary objective is to address the challenges of unsupervised image-to-image translation, particularly in the domain of applying different visual styles to given content images. However, popular single-image translation techniques face issues such as poor quality, excessive image noise, and discrepancies between generated images and human perception. To overcome these challenges, we propose an innovative approach utilizing a dual-branch attention-guided paradigm to achieve highquality single-image translation. Our method employs a multiscale pyramid structure and utilizes the generator to perform image transformation after downsampling the input image. In addition, we introduce dual-branch spatial attention modules and hybrid convolution modules to enhance the quality of generated images, mitigate noise, and align more closely with human visual perception. This approach helps enhance the focus on the main subject while minimizing interference from background information, thereby producing excellent image translation results. Furthermore, comprehensive experimental validation and comparisons with benchmark datasets, including the Terra Cotta Warriors dataset, affirm the effectiveness of our method. Specifically, our research significantly improves performance, as evidenced by the decrease in the single-image Fréchet inception distance value to 1.83. This result underscores the superior performance of our method compared with the state-ofthe-art approaches.