“…Our work combines multiple sizeable-to the extent that they respectively produced several surveys (Fortuna and Nunes, 2018;Gunasekara and Nejadgholi, 2018;Mishra et al, 2019;Banko et al, 2020;Madukwe et al, 2020;Muneer and Fati, 2020;Salawu et al, 2020;Jahan and Oussalah, 2021;Mladenovic et al, 2021) Recent cyberbullying work (Reynolds et al, 2011;Xu et al, 2012;Nitta et al, 2013;Bretschneider et al, 2014;Dadvar et al, 2014;Van Hee et al, 2015, e.g., are seminal work) has primarily focused on deploying Transformer-based models (Vaswani et al, 2017); by and large fine-tuning (Swamy et al, 2019;Paul and Saha, 2020;Gencoglu, 2021, e.g. ), or re-training (Caselli et al, 2020) BERT. It is worth noting that Elsafoury et al (2021a;Elsafoury et al (2021b) show that although fine-tuning BERT achieves state-of-theart performance in classification, its attention scores do not correlate with cyberbullying features, and they expect generalization of such models to be subpar.…”