Image super‐resolution (SR) has been widely applied in remote sensing to generate high‐resolution (HR) images without increasing hardware costs. However, SR is a severe ill‐posed problem. As deep learning advances, existing methods have solved this problem to a certain extent. However, the complex spatial distribution of remote sensing images still poses a challenge in effectively extracting abundant high‐frequency details from the images. Here, a single‐image super‐resolution (SISR) network based on the generative adversarial network (GAN) for remote sensing is presented, called JOA‐GAN. Firstly, a joint‐attention module (JOA) is proposed to focus the network on high‐frequency regions in remote sensing images to enhance the quality of image reconstruction. In the generator network, a multi‐scale densely connected feature extraction block (ERRDB) is proposed, which acquires features at different scales using MSconv blocks containing multi‐scale convolutions and automatically adjusts the features by JOA. In the discriminator network, the relative discriminator is used to compute the relative probability instead of the absolute probability, which helps the network learn clearer and more realistic texture details. JOA‐GAN is compared with other advanced methods, and the results demonstrate that JOA‐GAN has improved objective evaluation metrics and achieved superior visual effects.