2019
DOI: 10.3758/s13428-019-01234-0
|View full text |Cite
|
Sign up to set email alerts
|

Gender bias at scale: Evidence from the usage of personal names

Abstract: Recent research within the computational social sciences has shown that when computational models of lexical semantics are trained on standard natural-language corpora, they embody many of the implicit biases that are seen in human behavior (Caliskan, Bryson, & Narayanan, 2017). In the present study, we aimed to build on this work and demonstrate that there is a large and systematic bias in the use of personal names in the natural-language environment, such that male names are much more prevalent than female n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
18
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 17 publications
(19 citation statements)
references
References 48 publications
1
18
0
Order By: Relevance
“…The results of this analysis suggest that there are gender differences in the usage of language (at least on a very large scale; see Johns & Dye, 2019, for a more specific case of gender differences in language usage), and that there are some differences in lexical behaviour that are reflective of the gender discrepancy, such that a variable trained on a corpus of female authors outperforms a variable trained only on male authors, for female subjects. However, the best fitting model is still the one trained on all authors, suggesting that people have a mix of experience with the writings of both female and male authors.…”
Section: Resultsmentioning
confidence: 95%
See 1 more Smart Citation
“…The results of this analysis suggest that there are gender differences in the usage of language (at least on a very large scale; see Johns & Dye, 2019, for a more specific case of gender differences in language usage), and that there are some differences in lexical behaviour that are reflective of the gender discrepancy, such that a variable trained on a corpus of female authors outperforms a variable trained only on male authors, for female subjects. However, the best fitting model is still the one trained on all authors, suggesting that people have a mix of experience with the writings of both female and male authors.…”
Section: Resultsmentioning
confidence: 95%
“…Johns and Jamieson (2018) recently demonstrated that there is meaningful semantic variation at both the book and individual author level. In addition, more recent research has demonstrated that there are systematic differences in language usage based on the demographic characteristics of authors, such as gender (Johns & Dye, 2019) and time and place of birth (Johns & Jamieson, 2019). Thus, both measures will be modified with an SD transformation, using computational techniques adapted from the semantic distinctiveness model (SDM; Johns et al, 2012; Jones et al, 2012).…”
Section: Corpus-based Measures Of Prevalencementioning
confidence: 99%
“…By contrast, as is true with other languages (Rayner, 1998, 2009), the best predictors of when the eyes move are linguistic variables such as word frequency (Hermena et al, 2019), cloze predictability (Hermena, Bouamama, Liversedge, & Drieghe, in press) and the number of letters within a word (Hermena et al, 2017, Experiment 2; see also Paterson, Almabruk, McGowan, White, & Jordan, 2015). Interestingly, because most Arabic words are 6–9 letters long, they typically receive more, longer fixations (e.g., Hermena et al, 2015; Hermena et al, 2017; Hermena et al, in press) than is observed in the reading of English, where average word length is approximately five letters (Brysbaert, 2019; Johns & Dye, 2019). These results support a clear dissociation between the factors that determine where readers move their eyes (e.g., visual acuity and saccade‐programming parameters, as modulated by the spatial layout of a text) and the factors that determine when readers move their eyes (e.g., rate of lexical processing as modulated by a variety of linguistic properties of the text).…”
Section: The Arabic Language and Writing Systemmentioning
confidence: 99%
“…).Because fiction is easier to read than non-fiction, we can investigate whether this coincides with a difference in word length and what impact this would predict for reading rate. Average word length is indeed shorter in English fiction than in non-fiction (4.2 letters vs. 4.6 letters, based on the billion word corpora ofJohns & Dye, 2019).12 If we assume that the 238 wpm silent reading rate estimate from the meta-analysis mainly comes from non-fiction (expository texts), we can use the equation 238 * 4.6/4.2 to see what average reading rate would be predicted for fiction.…”
mentioning
confidence: 99%