One of the most challenging topics in artificial intelligence is image‐to‐image translation, the purpose of which is generating images close to those in the target domain while preserving the important features of the images in the source domain. In this direction, various types of generative adversarial networks have been developed. ARDA‐UNIT, presented in this paper, seeks to meet the main challenges of these networks, that is, producing a high‐quality image in a reasonable amount of time, and transferring content between two images with different structures. The proposed recurrent dense self‐attention block, applied in ARDA‐UNIT's generator latent space, simultaneously increases its generating capability and decreases the training parameters. ARDA‐UNIT has a feature extraction module which feeds both the generator and the discriminator. This module uses a new adaptive feature fusion method which combines multi‐scale features in such a way that the characteristics of each scale are preserved. The module also uses a pre‐trained CNN that reduces the training parameters. Moreover, a feature similarity loss is introduced that guides the model to change the structure of the source domain in accordance with that in the target domain. Experiments performed on different datasets using FID, KID and IS evaluation criteria have shown that the model reduces computational loads, transfers structures well, and achieves better qualities.