Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.150
|View full text |Cite
|
Sign up to set email alerts
|

Intrinsic Bias Metrics Do Not Correlate with Application Bias

Abstract: Natural Language Processing (NLP) systems learn harmful societal biases that cause them to amplify inequality as they are deployed in more and more situations. To guide efforts at debiasing these systems, the NLP community relies on a variety of metrics that quantify bias in models. Some of these metrics are intrinsic, measuring bias in word embedding spaces, and some are extrinsic, measuring bias in downstream tasks that the word embeddings enable. Do these intrinsic and extrinsic metrics correlate with each … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

7
62
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
3

Relationship

1
8

Authors

Journals

citations
Cited by 61 publications
(69 citation statements)
references
References 34 publications
7
62
0
Order By: Relevance
“…However, there are several applications of the counterfactual data augmentation strategies from §4.1.1 in this setting: for example, Garg et al (2019) construct counterfactuals by swapping lists of "identity terms", with the goal of reducing bias in text classification, and Zhao et al (2018) swap gender markers such as pronouns and names for coreference resolution. Counterfactual data augmentation has also been applied to reduce bias in pre-trained contextualized word embedding models (e.g., Huang et al, 2019;Maudslay et al, 2019) but the extent to which biases in pretrained models propagate to downstream applications remains unclear (Goldfarb-Tarrant et al, 2021). Fairness applications of the distributional criteria discussed in §4.1.2 are relatively rare, but Adragna et al (2020) show that invariant risk minimization (Arjovsky et al, 2019) -which attempts to learn an invariant predictor across multiple "environments" -can reduce the use of spurious correlations with race on the Civil Comments dataset (Borkan et al, 2019) for toxicity detection.…”
Section: Fairness and Biasmentioning
confidence: 99%
“…However, there are several applications of the counterfactual data augmentation strategies from §4.1.1 in this setting: for example, Garg et al (2019) construct counterfactuals by swapping lists of "identity terms", with the goal of reducing bias in text classification, and Zhao et al (2018) swap gender markers such as pronouns and names for coreference resolution. Counterfactual data augmentation has also been applied to reduce bias in pre-trained contextualized word embedding models (e.g., Huang et al, 2019;Maudslay et al, 2019) but the extent to which biases in pretrained models propagate to downstream applications remains unclear (Goldfarb-Tarrant et al, 2021). Fairness applications of the distributional criteria discussed in §4.1.2 are relatively rare, but Adragna et al (2020) show that invariant risk minimization (Arjovsky et al, 2019) -which attempts to learn an invariant predictor across multiple "environments" -can reduce the use of spurious correlations with race on the Civil Comments dataset (Borkan et al, 2019) for toxicity detection.…”
Section: Fairness and Biasmentioning
confidence: 99%
“…We thus call for future studies on the validity of word embedding bias measures. Fourth, Goldfarb-Tarrant et al (2021) argue that intrinsic (word embeddings) biases sometimes fail to agree with extrinsic biases (measured in downstream tasks, e.g. coreference resolution).…”
Section: Conclusion and Discussionmentioning
confidence: 98%
“…A related line of research measures bias present in sentence or word representations (Bolukbasi et al, 2016;Caliskan et al, 2017;Kurita et al, 2019;Sedoc and Ungar, 2019;Chaloner and Maldonado, 2019;Dev and Phillips, 2019;Gonen and Goldberg, 2019;Hall Maudslay et al, 2019;Liang et al, 2020;Shin et al, 2020;Liang et al, 2020;Papakyriakopoulos et al, 2020). However, such intrinsic metrics have been recently shown not to correlate with application bias (Goldfarb-Tarrant et al, 2021). In yet another line of research, Badjatiya et al (2019) detect bias through identifying bias sensitive words.…”
Section: Related Workmentioning
confidence: 99%
“…Developing such understanding is crucial for drawing reliable conclusions and actionable recommendations regarding bias. We focus on bias measurement for downstream tasks, as Goldfarb-Tarrant et al (2021) have recently shown that there is no reliable correlation between bias measured intrinsically on, for example, word embeddings, and bias measured extrinsically on a downstream task. We narrow down the scope of this paper to tasks that do not involve prediction of a sensitive attribute.…”
Section: Introductionmentioning
confidence: 99%