“…Debias Hate Speech Detection Recent works (Yin and Zubiaga 2021;Wiegand et al 2019;Kennedy et al, 2020;Ma et al, 2020;Gehman et al, 2020;Dreier et al, 2022;Stanovsky et al, 2019;Thakur et al, 2023;Ziems et al, 2023) have been studying the generalizability and biases for hate speech detection (Talat et al, 2018;AlKhamissi et al, 2022;Röttger et al, 2022;Bianchi et al, 2022). For instance, prior work found that existing hate speech detection models are biased against African American Vernacular English Speakers (Harris et al, 2022b;Sap et al, 2019) and certain identity words are highly correlated with these hateful labels (Bender et al, 2021;ElSherief et al, 2021a).…”