“…Recently, Transformer-based architectures (Mozafari et al, 2019;Aluru et al, 2020;Samghabadi et al, 2020;Salminen et al, 2020;Qian et al, 2021;Kennedy et al, 2020;Arviv et al, 2021) achieved significant improvements over RNN and CNN models (Zhang et al, 2016;Gambäck and Sikdar, 2017;Del Vigna12 et al, 2017;Park and Fung, 2017). In an effort to mitigate the need for extensive annotation some works use transformers to generate more samples, e.g., (Vidgen et al, 2020b;Wullach et al, 2020Wullach et al, , 2021. Zhou et al (2021) integrate features from external resources to support the model performance.…”