“…Transformer [35] architecture has achieved remarkable performance on many important Natural Language Processing (NLP) tasks, so the robustness of transformer has been studied on those NLP tasks. [17,19,29,22,12,43] conducted adversarial attacks on transformers including pretrained models, and in their experiments transformers usually show better robustness compared to models with structures such as LSTM or CNN, with a theoretical explanation provided in [17]. However, due to the discrete nature of NLP models, these studies are focusing on discrete perturbations (e.g., word or character substitutions) which are very different from small and continuous perturbations in computer vision tasks.…”