“…The Transformer ( Vaswani et al, 2017 ), a novel neural network, was first applied to natural language processing (NLP) tasks, such as machine translation and English constituency analysis tasks, and achieved significant improvements in results. In the field of computer vision, Transformer-based models mainly use the key module self-attention mechanism to extract intrinsic features and show great potential in artificial intelligence applications, such as high-resolution image synthesis ( Dalmaz, Yurt & Ukur, 2021 ), object detection ( Carion et al, 2020 ), classification ( Yuan et al, 2021 ), segmentation ( Zheng et al, 2021 ), image processing ( Lin et al, 2021 ), and re-identification ( Luo et al, 2020 ). Furthermore, in vision applications, CNNs have previously been considered the fundamental component, but now the transformer shows that it will be a potential replacement for CNNs.…”