“…The latter issue is particularly relevant to hate speech detection since current hate speech datasets vary in data source, sampling strategy and annotation process (Vidgen and Derczynski, 2020;Poletto et al, 2020), and are known to exhibit annotator biases (Waseem, 2016;Waseem et al, 2018;Sap et al, 2019) as well as topic and author biases (Wiegand et al, 2019;Nejadgholi and Kiritchenko, 2020). Correspondingly, models trained on such datasets have been shown to be overly sensitive to lexical features such as group identifiers (Park et al, 2018;Dixon et al, 2018;Kennedy et al, 2020), and to generalise poorly to other datasets (Nejadgholi and Kiritchenko, 2020;Samory et al, 2020). Therefore, held-out performance on current hate speech datasets is an incomplete and potentially misleading measure of model quality.…”