“…Recently, natural language processing model Transformer [76] has gained much popularity in the computer vision community. When used in vision problems such as image classification [66,19,84,56,45,55,75], ob-ject detection [6,53,74,56], segmentation [84,99,56,4] and crowd counting [47,69], it learns to attend to important image regions by exploring the global interactions between different regions. Due to its impressive performance, Transformer has also been introduced for image restoration [9,5,82].…”