With the maturation of digital radio frequency memory (DRFM) technology, various intra-pulse retransmission interference methods have emerged. These flexible and changeable retransmission interference methods pose significant challenges to radar detection tasks, particularly in modern battlefields. This paper proposes an attention-guided complex-valued transformer (AGCT) as a solution. First, the encoder maps the received signal contaminated by interference and noise into a high-dimensional space. Then, the dilated convolution block (DCB) group and attention block (AB) group in the mask estimator extract the delicate multi-scale features and large-scale features of the interference, respectively, to obtain a multidimensional space mask. Finally, the decoder restores interference to the time domain and outputs the estimated target echo using residual learning. Considering the characteristics of intra-pulse interference, we introduced the energy attention block (EAB) at the end of the DCBs and the ABs within our network. This addition ensures a heightened focus on extracting interference features. Furthermore, we implemented a curriculum learning strategy during the network training. This approach gradually acclimates the network to fit different types of retransmission interference, starting from simpler to more complex scenarios. Our extensive experiments, conducted under various conditions, have provided compelling evidence of the AGCT’s superior performance. Compared to the comparative network, the AGCT’s advantages are particularly pronounced under more harsh conditions, demonstrating its robustness and effectiveness.