2016
DOI: 10.48550/arxiv.1607.06520
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

Abstract: The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Ge… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
115
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 75 publications
(115 citation statements)
references
References 23 publications
0
115
0
Order By: Relevance
“…Nonetheless, CNA provides a new means for preserving the integrity of the model by matching the feature manifold using a contrastive term that has not been previously explored. Potential Negative Societal Impacts: CNA may accidentally transfer the bias in the source feature embedding [5] to the target model. As a result it may unintentionally amplify these biases when deploying the model.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Nonetheless, CNA provides a new means for preserving the integrity of the model by matching the feature manifold using a contrastive term that has not been previously explored. Potential Negative Societal Impacts: CNA may accidentally transfer the bias in the source feature embedding [5] to the target model. As a result it may unintentionally amplify these biases when deploying the model.…”
Section: Discussionmentioning
confidence: 99%
“…to avoid trivial solutions where every sample is isolated. Implementations: For Isomap, LLE and Hessian LLE, the only free parameter is the number of neighbors K. We set it to 10 on synthetic data and sweep over [5,10,50,100,200] on real-world datasets. For MVU and CNA, we use a 3-layer multi-layer perceptron (MLP) with Tanh activation function as the inductive projection model.…”
Section: Manifold Learning/dimensionality Reductionmentioning
confidence: 99%
“…To address our research questions, we adopted a novel methodology for quantifying the content of obesity-related language in news media, based on word embedding analysis (Word2Vec). Word embedding [43], is a neural network approach capable of learning distributed representations of words from a set of documents. It draws upon vector space mathematics to reveal hidden biases in language.…”
Section: Methodsmentioning
confidence: 99%
“…Use of word embedding techniques to capture social and intersectional bias is gaining attention. Bolukbasi et al [43], for example, used word embedding to examine the association between occupations and gender roles in Google News articles. More recently, Arseniev-Koehler et al [46] used word embedding to extract cultural schemata about body weight from New York Times articles.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation