Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate

Kirk, Hannah Rose; Vidgen, Bertram; Röttger, Paul; Thrush, Tristan; Hale, Scott A.

doi:10.18653/v1/2022.naacl-main.97

Cited by 20 publications

(15 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Fairness measures were very diverse, including, for example, equalized odds (Wang et al, 2020b), demographic parity (Coston et al, 2020), equal opportunity (Cotter et al, 2019), individual fairness (Black et al, 2020), and calibration by group (Petersen et al, 2023). Capabilities included generalization (Wu et al, 2020), calibration (Hendrycks et al, 2019b), handling of linguistic phenomena (Naik et al, 2018), level of bias (Nangia et al, 2020), reasoning (Liu et al, 2019a), and task-speciĄc capabilities, e.g., recognizing emoji-based hate (Kirk et al, 2022).…”

Section: Quantitative Resultsmentioning

confidence: 99%

“…We say a paper in our survey evaluates a speciĄcation if it measures it. I.e., the paper either proposes a new method of how to evaluate a speciĄcation (e.g., by designing a test suite (Kirk et al, 2022) or a metric (Weng et al, 2018)) or studies a previously proposed speciĄcation as part of the evaluation (in the simplest case just reports its outcome).…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Specification Overfitting in Artificial Intelligence

Roth,

de Araujo,

Xia

et al. 2024

Preprint

View full text Add to dashboard Cite

Machine learning (ML) and artificial intelligence (AI) approaches are often criticized for their inherent bias and for their lack of control, accountability, and transparency. Consequently, regulatory bodies struggle with containing this technology's potential negative side effects. High-level requirements such as fairness and robustness need to be formalized into concrete specification metrics, imperfect proxies that capture isolated aspects of the underlying requirements. Given possible trade-offs between different metrics and their vulnerability to over-optimization, integrating specification metrics in system development processes is not trivial. This paper defines specification overfitting, a scenario where systems focus excessively on specified metrics to the detriment of high-level requirements and task performance. We present an extensive literature survey to categorize how researchers propose, measure, and optimize specification metrics in several AI fields (e.g., natural language processing, computer vision, reinforcement learning). Using a keyword-based search on papers from major AI conferences and journals between 2018 and mid-2023, we identify and analyze 74 papers that propose or optimize specification metrics. We find that although most papers implicitly address specification overfitting (e.g., by reporting more than one specification metric), they rarely discuss which role specification metrics should play in system development or explicitly define the scope and assumptions behind metric formulations.

show abstract

Section: Quantitative Resultsmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Specification Overfitting in Artificial Intelligence

Roth,

de Araujo,

Xia

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…Multi-source AL for NLP While AL has been studied for a variety of tasks in NLP (Siddhant and Lipton, 2018;Lowell et al, 2019;Ein-Dor et al, 2020;Shelmanov et al, 2021;Margatina et al, 2021;Yuan et al, 2022;Schröder et al, 2022;Margatina et al, 2022;Kirk et al, 2022;Zhang et al, 2022), the majority of work remains limited to settings where training data is assumed to stem from a single source. Some recent works have sought to address the issues that arise when relaxing the single-source assumption (Ghorbani et al, 2021;, though results remain primarily limited to image classification.…”

Section: Related Workmentioning

confidence: 99%

Investigating Multi-source Active Learning for Natural Language Inference

Snijders¹,

Margatina²

2023

Preprint

View full text Add to dashboard Cite

In recent years, active learning has been successfully applied to an array of NLP tasks. However, prior work often assumes that training and test data are drawn from the same distribution. This is problematic, as in real-life settings data may stem from several sources of varying relevance and quality. We show that four popular active learning schemes fail to outperform random selection when applied to unlabelled pools comprised of multiple data sources on the task of natural language inference. We reveal that uncertainty-based strategies perform poorly due to the acquisition of collective outliers, i.e., hard-to-learn instances that hamper learning and generalization. When outliers are removed, strategies are found to recover and outperform random baselines. In further analysis, we find that collective outliers vary in form between sources, and show that hard-to-learn data is not always categorically harmful. Lastly, we leverage dataset cartography to introduce difficultystratified testing and find that different strategies are affected differently by example learnability and difficulty.

show abstract

“…Works such as Prabhakaran et al (2019) and Hutchinson et al (2020) partially mitigate this by using real-world data and targeting specific syntactic slots for substitution, but this can yield incoherent or contradictory text when there are multiple entities referenced in a sentence. Finally, recent works with templates such as and Kirk et al (2021) have been effective at detailing problems with modern toxicity classifiers, by investing significant targeted effort into probing task-specific functionality, and employing human validation for generated examples.…”

Section: Counterfactual Generationmentioning

confidence: 99%

Flexible text generation for counterfactual fairness probing

Fryer¹,

Packer²,

Beutel³

et al. 2022

Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

View full text Add to dashboard Cite

A common approach for testing fairness issues in text-based classifiers is through the use of counterfactuals: does the classifier output change if a sensitive attribute in the input is changed? Existing counterfactual generation methods typically rely on wordlists or templates, producing simple counterfactuals that don't take into account grammar, context, or subtle sensitive attribute references, and could miss issues that the wordlist creators had not considered. In this paper, we introduce a task for generating counterfactuals that overcomes these shortcomings, and demonstrate how large language models (LLMs) can be leveraged to make progress on this task. We show that this LLM-based method can produce complex counterfactuals that existing methods cannot, comparing the performance of various counterfactual generation methods on the Civil Comments dataset and showing their value in evaluating a toxicity classifier. * Work done as a Google AI Resident.Original: True and the same goes with headscarves . Its not religious requirement but a cultural choice. Simple otherwise there would be no Muslim woman that don't wear them and clearly there are.Counterfactual: True and the same goes with yarmulkes . Its not a religious requirement but a cultural choice. Simple otherwise there would be no Jewish man that don't wear them and clearly there are.

show abstract

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate

Cited by 20 publications

References 39 publications

Specification Overfitting in Artificial Intelligence

Specification Overfitting in Artificial Intelligence

Investigating Multi-source Active Learning for Natural Language Inference

Flexible text generation for counterfactual fairness probing

Contact Info

Product

Resources

About