Sonar images are inherently affected by speckle noise, which degrades image quality and hinders image exploitation. Despeckling is an important pre-processing task that aims to remove such noise so as to improve the accuracy of analysis tasks on sonar images. In this paper, we propose a novel transformer-based generative adversarial network named SID-TGAN for sonar image despeckling. In the SID-TGAN framework, transformer and convolutional blocks are used to extract global and local features, which are further integrated into the generator and discriminator networks for feature fusion and enhancement. By leveraging adversarial training, SID-TGAN learns more comprehensive representations of sonar images and shows outstanding performance in speckle denoising. Meanwhile, SID-TGAN introduces a new adversarial loss function that combines image content, local texture style, and global similarity to reduce image distortion and information loss during training. Finally, we compare SID-TGAN with state-of-the-art despeckling methods on one image dataset with synthetic optical noise and four real sonar image datasets. The results show that it achieves significantly better despeckling performance than existing methods on all five datasets.