2020
DOI: 10.48550/arxiv.2012.15859
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Intrinsic Bias Metrics Do Not Correlate with Application Bias

Abstract: Natural Language Processing (NLP) systems learn harmful societal biases that cause them to widely proliferate inequality as they are deployed in more and more situations. To address and combat this, the NLP community relies on a variety of metrics to identify and quantify bias in black-box models and to guide efforts at debiasing. Some of these metrics are intrinsic, and are measured in word embedding spaces, and some are extrinsic, which measure the bias present downstream in the tasks that the word embedding… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 19 publications
0
11
0
Order By: Relevance
“…Silva, Tambwekar, and Gombolay (2021), for instance, find that (at least when using contextualized embedding models) WEAT estimates poorly predict bias estimated by other measures and is even internally inconsistent. Goldfarb-Tarrant et al (2020) find that estimates of the bias present in word embeddings (such as those produced by the WEAT) do not meaningfully correlate with downstream biases of applications using those embeddings. Finally, terms in the semantic space estimated by word embeddings tend to cluster on non-intuitive dimensions such as term frequency (Arora, Liang, and Ma 2017;Mu, Bhat, and Viswanath 2017;Gong et al 2018).…”
Section: Related Workmentioning
confidence: 73%
“…Silva, Tambwekar, and Gombolay (2021), for instance, find that (at least when using contextualized embedding models) WEAT estimates poorly predict bias estimated by other measures and is even internally inconsistent. Goldfarb-Tarrant et al (2020) find that estimates of the bias present in word embeddings (such as those produced by the WEAT) do not meaningfully correlate with downstream biases of applications using those embeddings. Finally, terms in the semantic space estimated by word embeddings tend to cluster on non-intuitive dimensions such as term frequency (Arora, Liang, and Ma 2017;Mu, Bhat, and Viswanath 2017;Gong et al 2018).…”
Section: Related Workmentioning
confidence: 73%
“…With the availability of fairness metrics, we also risk that such metrics are used as proof or as insurance that the models are unbiased, although most metrics can only be considered indicators of bias at most (Goldfarb-Tarrant et al, 2020). Especially since we found major limitations when comparing different metrics, which demonstrates that current metrics have significant limitations.…”
Section: Discussion and Ethical Considerationsmentioning
confidence: 99%
“…career and family-related words, This method relies on a vector representation for each word, which can be obtained in different ways in contextualized models and we discuss in Section 3 and § 4.3. Finally, it should also be noted that WEAT serves as an indicator of bias, not a predictor (Goldfarb-Tarrant et al, 2020).…”
Section: Fairness In Word Embeddingsmentioning
confidence: 99%
See 1 more Smart Citation
“…;Xia et al (2020) use those datasets to show racial biases through a higher false positive rate for AAE, whileDavidson et al (2019) use the dataset ofBlodgett et al (2016) for racial bias evaluation by comparing probabilities of tweets from different social groups being predicted as hate speech Davani et al (2020). collect a dataset of comments from the Gab platform, but analyze biases by comparing a language model's log likelihood differences for constructed counterfactuals Goldfarb- Tarrant et al (2020). add gender labels to the dataset fromFounta et al (2018) to analyze gender bias in hate speech detection, and further useBasile et al (…”
mentioning
confidence: 99%