2018
DOI: 10.7717/peerj-cs.156
|View full text |Cite
|
Sign up to set email alerts
|

Comparison and benchmark of name-to-gender inference services

Abstract: The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person's gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five nameto-gender inference services by applying them to the classification of a test data set … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

4
276
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 297 publications
(281 citation statements)
references
References 12 publications
4
276
1
Order By: Relevance
“…“Andrea B.” would become “Andrea”). Gender API predicts gender using a database of >2 million name-gender relationships retrieved from governmental records and data crawled from social networks (Santamaria and Mihaljevic, 2018). The service accepts parameters for localization, which we included from our previously defined dataset of author countries.…”
Section: Methodsmentioning
confidence: 99%
“…“Andrea B.” would become “Andrea”). Gender API predicts gender using a database of >2 million name-gender relationships retrieved from governmental records and data crawled from social networks (Santamaria and Mihaljevic, 2018). The service accepts parameters for localization, which we included from our previously defined dataset of author countries.…”
Section: Methodsmentioning
confidence: 99%
“…For 3 298 951 papers that were published between 2008 and 2016 and could be matched with PubMed we assigned gender of first and last authors-which can be considered in medicine as dominant authorship positions 29 -using their names, according to our gender assignment algorithm 30. More details on the algorithm, which has also been used by Santamaría and Mihaljević31 and Karimi and colleagues,32 can be found in the supplementary materials of our previous work.…”
mentioning
confidence: 99%
“…For speakers invited to the ICM 2018 in Rio de Janeiro, we extracted their names, country of citizenship and the ICM sections of their talks from the official ICM-2018 website. We used Python package gender-guesser2, which has shown very reliable results in a recent benchmark on name-based gender inference [17], to infer the gender3 of the speakers using their forenames when this information was missing. For speakers whose names are not highly correlated with only one gender (across different countries and languages), and for which gender-guesser hence did not produce a definite gender assignment, we filled this information manually, mainly based on field knowledge and Internet research.…”
Section: Data Basis and Methodsmentioning
confidence: 99%