2018
DOI: 10.1007/978-3-030-01768-2_27
|View full text |Cite
|
Sign up to set email alerts
|

Biased Embeddings from Wild Data: Measuring, Understanding and Removing

Abstract: Many modern Artificial Intelligence (AI) systems make use of data embeddings, particularly in the domain of Natural Language Processing (NLP). These embeddings are learnt from data that has been gathered "from the wild" and have been found to contain unwanted biases. In this paper we make three contributions towards measuring, understanding and removing this problem. We present a rigorous way to measure some of these biases, based on the use of word lists created for social psychology applications; we observe … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 15 publications
(13 citation statements)
references
References 19 publications
0
13
0
Order By: Relevance
“…This imbalance in representation can have a dramatic downstream effect on NLP systems trained on such a corpus, such as giving preference to male engineers over female engineers in an automated resumé filtering system. Gender stereotypes of this sort have been observed in word embeddings (Bolukbasi et al, 2016;Sutton et al, 2018), contextual word embeddings (Zhao et al, 2019), and co-reference resolution systems (Rudinger et al, 2018;Zhao et al, 2018) inter alia.…”
Section: Gender Stereotypes In Textmentioning
confidence: 99%
See 1 more Smart Citation
“…This imbalance in representation can have a dramatic downstream effect on NLP systems trained on such a corpus, such as giving preference to male engineers over female engineers in an automated resumé filtering system. Gender stereotypes of this sort have been observed in word embeddings (Bolukbasi et al, 2016;Sutton et al, 2018), contextual word embeddings (Zhao et al, 2019), and co-reference resolution systems (Rudinger et al, 2018;Zhao et al, 2018) inter alia.…”
Section: Gender Stereotypes In Textmentioning
confidence: 99%
“…This is because NLP systems depend on language corpora, which are inherently "not objective; they are creations of human design" (Crawford, 2013). One type of societal bias that has received considerable attention from the NLP community is gender stereotyping (Garg et al, 2017;Rudinger et al, 2017;Sutton et al, 2018). Gender stereotypes can manifest in language in overt ways.…”
Section: Introductionmentioning
confidence: 99%
“…Gender bias has been detected, studied, and partially addressed for standard and contextualized word embeddings in a number of studies (Bolukbasi et al, 2016;Caliskan et al, 2017;Sutton et al, 2018;Basta et al, 2019;Garg et al, 2018;Zhao et al, 2018Zhao et al, , 2019. These studies showed that training word embeddings on large human produced corpora such as news text leads to encoding societal biases including gender and race.…”
Section: Related Workmentioning
confidence: 99%
“…These algorithms extract not only semantic and syntactic information from everyday language, but also subtle biases in the usage of words [41]. For example, the words nurse or housekeeper frequently take a more feminine position in the semantic space than the words pilot or engineer, which take a more masculine position [41,42].…”
Section: Introductionmentioning
confidence: 99%