2021
DOI: 10.5195/jmla.2021.1252
|View full text |Cite|
|
Sign up to set email alerts
|

Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference

Abstract: Objective: We recently showed that genderize.io is not a sufficiently powerful gender detection tool due to a large number of nonclassifications. In the present study, we aimed to assess whether the accuracy of inference by genderize.io can be improved by manipulating the first names in the database.Methods: We used a database containing the first names, surnames, and gender of 6,131 physicians practicing in a multicultural country (Switzerland). We uploaded the original CSV file (file #1), the file obtained a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(22 citation statements)
references
References 6 publications
0
22
0
Order By: Relevance
“…In addition, genderize.io has been found by independent researchers to have an error rate comparable to other published gender prediction methods, with a error-rate on predicted names below 6% [31, 32]. However, it should be noted that the error rate varies by name origin with the largest decrease in performance on names with an Asian origin [31, 32].…”
Section: Resultsmentioning
confidence: 97%
“…In addition, genderize.io has been found by independent researchers to have an error rate comparable to other published gender prediction methods, with a error-rate on predicted names below 6% [31, 32]. However, it should be noted that the error rate varies by name origin with the largest decrease in performance on names with an Asian origin [31, 32].…”
Section: Resultsmentioning
confidence: 97%
“…Some approaches use other features instead of whole names. For example, Jensen et al 17 use n-grams of letters within names; Sebo 19 transforms names to conform to the reference data set by removing diacritics and the second part of compound names; and others bring in additional data besides names alone. [20][21][22][23] These approaches can improve accuracy, but ultimately information-theoretic limits prevent these algorithms from outperforming the Bayes error rate of their design (e.g., Leslies in Utah in 2015 may have a different proportion of women than Leslies overall, but the core problem remains unchanged: some will be misgendered).…”
Section: Methods and Limits Of Imputationmentioning
confidence: 99%
“…Santamaría and Mihaljević (2018) and Sebo (2021a) extensively review these and other gender detection tools. In Genderize.Io, we used the technique recommended by Sebo (2021b) to improve accuracy. Generally, these tools report the proportion and number of times a name is associated with men or women, alongside the number of examples checked.…”
Section: Methodsmentioning
confidence: 99%