Easy and efficient acquisition of high-resolution remote sensing images is of importance in geographic information systems. Previously, deep neural networks composed of convolutional layers have achieved impressive progress in superresolution reconstruction. However, the inherent problems of the convolutional layer, including the difficulty of modeling the longrange dependency, limit the performance of these networks on super-resolution reconstruction. To address the above problems, we propose a generative adversarial network (GAN) by combining the advantages of the swin transformer and convolutional layers, called SWCGAN. It is different from the previous superresolution models, which are composed of pure convolutional blocks. The essential idea behind the proposed method is to generate high-resolution images by a generator network with a hybrid of convolutional and swin transformer layers and then to use a pure swin transformer discriminator network for adversarial training. In the proposed method, (1) we employ a convolutional layer for shallow feature extraction that can be adapted to flexible input sizes; (2) we further propose the residual dense swin transformer block (RDSTB) to extract deep features for upsampling to generate high-resolution images; and (3) we use a simplified swin transformer as the discriminator for adversarial training. To evaluate the performance of the proposed method, we compare the proposed method with other state-of-the-art methods by utilizing the UCMerced benchmark dataset, and we apply the proposed method to real-world remote sensing images. The results demonstrate that the reconstruction performance of the proposed method outperforms other state-of-the-art methods in most metrics.