The emergence of deep learning techniques has revolutionized the use of machine learning algorithms to classify complicated environments, notably in remote sensing. Convolutional Neural Networks (CNNs) have shown considerable promise in classifying challenging high-dimensional remote sensing data, particularly in the classification of wetlands. State-of-the-art Natural Language Processing (NLP) algorithms, on the other hand, are transformers. Despite the fact that transformers have been utilized for a few remote sensing applications, they have not been compared to other well-known CNN networks in complex wetland classification. As such, for the classification of complex coastal wetlands in the study area of Saint John city, located in New Brunswick, Canada, we modified and employed the Swin Transformer algorithm. Moreover, the developed transformer classifier results were compared with two well-known deep CNNs of AlexNet and VGG-16. In terms of average accuracy, the proposed Swin Transformer algorithm outperformed the AlexNet and VGG-16 techniques by 14.3% and 44.28%, respectively. The proposed Swin Transformer classifier obtained F-1 scores of 0.65, 0.71, 0.73, 0.78, 0.82, 0.84, and 0.84 for the recognition of coastal marsh, shrub, bog, fen, aquatic bed, forested wetland, and freshwater marsh, respectively. The results achieved in this study suggest the high capability of transformers over very deep CNN networks for the classification of complex landscapes in remote sensing.