Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.44
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Discovery of Implicit Gender Bias

Abstract: Despite their prevalence in society, social biases are difficult to identify, primarily because human judgements in this domain can be unreliable. We take an unsupervised approach to identifying gender bias against women at a comment level and present a model that can surface text likely to contain bias. Our main challenge is forcing the model to focus on signs of implicit bias, rather than other artifacts in the data. Thus, our methodology involves reducing the influence of confounds through propensity matchi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
27
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 31 publications
(28 citation statements)
references
References 38 publications
1
27
0
Order By: Relevance
“…But veiled offenses are not represented in existing toxicity datasets (Waseem and Hovy, 2016;Davidson et al, 2017;Founta et al, 2018) and building a new dataset is expensive: candidates for annotation cannot be filtered through lexicons and random sampling of social media posts will surface only a tiny fraction of relevant examples (Breitfeller et al, 2019). Moreover, since biased text is often unconscious and subjective, untrained annotators might mislabel it due to their own biases (Breitfeller et al, 2019;Field and Tsvetkov, 2020).…”
Section: Toxic Language In Disguisementioning
confidence: 99%
“…But veiled offenses are not represented in existing toxicity datasets (Waseem and Hovy, 2016;Davidson et al, 2017;Founta et al, 2018) and building a new dataset is expensive: candidates for annotation cannot be filtered through lexicons and random sampling of social media posts will surface only a tiny fraction of relevant examples (Breitfeller et al, 2019). Moreover, since biased text is often unconscious and subjective, untrained annotators might mislabel it due to their own biases (Breitfeller et al, 2019;Field and Tsvetkov, 2020).…”
Section: Toxic Language In Disguisementioning
confidence: 99%
“…One straightforward reason is that implicit stereotypes persist in our society. Even if explicit gender discrimination occurs less frequently today than in the past, implicit attitudes about females being submissive and less worthy than males remain pervasive (Field & Tsvetkov, 2020;Storage et al, 2020). Consistent with this possibility is the observation that males are considered more prototypical than females when categorizing humans (Bailey et al, 2020).…”
Section: Patterns Of Gender Representation Explainedmentioning
confidence: 97%
“…Thus, in many existing datasets implicit abuse can be found in only a small proportion of instances (Wiegand et al, 2019). Furthermore, implicit abuse presents additional challenges to human annotators as sometimes specific background knowledge and experience are required in order to understand the hidden meaning behind implicit statements (Sap et al, 2020;Breitfeller et al, 2019;Field & Tsvetkov, 2020). To deal effectively with this class of abuse, annotated datasets focusing on implicitly abusive language are needed so that automatic detection systems are exposed to a wide variety of such examples through their training data (Wiegand et al, 2021).…”
Section: Professional Responsibilitymentioning
confidence: 99%