Proceedings of the Seventh Joint Conference on Lexical And Computational Semantics 2018
DOI: 10.18653/v1/s18-2005
|View full text |Cite
|
Sign up to set email alerts
|

Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems

Abstract: Automatic machine learning systems can inadvertently accentuate and perpetuate inappropriate human biases. Past work on examining inappropriate biases has largely focused on just individual systems. Further, there is no benchmark dataset for examining inappropriate biases in systems. Here for the first time, we present the Equity Evaluation Corpus (EEC), which consists of 8,640 English sentences carefully chosen to tease out biases towards certain races and genders. We use the dataset to examine 219 automatic … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
307
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 298 publications
(308 citation statements)
references
References 25 publications
1
307
0
Order By: Relevance
“…2 Regard Sentiment scores capture differences in language polarity and has been used to quan-1 https://github.com/ewsheng/nlg-bias 2 To constrain the scope of our analysis, we limit each demographic type to two classes, which, while unrepresentative of the real-world diversity, allows us to focus on more depth in analysis. tify bias (Kiritchenko and Mohammad, 2018), but there has been little analysis on the correlation of sentiment to human judgment of bias. Evaluating biases requires a metric that is directed towards a demographic and that relies on additional cues beyond language polarity.…”
Section: Definitionsmentioning
confidence: 99%
“…2 Regard Sentiment scores capture differences in language polarity and has been used to quan-1 https://github.com/ewsheng/nlg-bias 2 To constrain the scope of our analysis, we limit each demographic type to two classes, which, while unrepresentative of the real-world diversity, allows us to focus on more depth in analysis. tify bias (Kiritchenko and Mohammad, 2018), but there has been little analysis on the correlation of sentiment to human judgment of bias. Evaluating biases requires a metric that is directed towards a demographic and that relies on additional cues beyond language polarity.…”
Section: Definitionsmentioning
confidence: 99%
“…Our computational approach allows us to revisit this type of work ( §3) using FOOTBALL, without relying on subjective human coding. Within NLP, researchers have studied gender bias in word embeddings (Bolukbasi et al, 2016;Caliskan et al, 2017), racial bias in police stops (Voigt et al, 2017) and on Twitter (Hasanuzzaman et al, 2017), and biases in NLP tools like sentiment analysis systems (Kiritchenko and Mohammad, 2018). Especially related to our work is that of Ananya et al (2019), who analyze mentionlevel gender bias, and Fu et al (2016), who examine gender bias in tennis broadcasts.…”
Section: Related Workmentioning
confidence: 96%
“…We refer to changing the gen-der of the gendered nouns as gender-swapping. Gender-swapping can be generalized to sentences by swapping each male-definitional word with its respective female equivalent and vice-versa (Zhao et al, 2018a;Lu et al, 2018;Kiritchenko and Mohammad, 2018). If the model does not make decisions based on genders, it should perform equally for both sentences.…”
Section: Measuring Performance Differences Across Gendersmentioning
confidence: 99%