Performance of gender detection tools: a comparative study of name-to-gender inference services

Sebo, Paul

doi:10.5195/jmla.2021.1185

Cited by 114 publications

(83 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We recommend using genderize.io only with files that were modified in this way, as the proportion of nonclassifications was very high in file #1 (naCoded 16.4%). By comparing the results obtained with this double manipulation of first names with those already published in our earlier study [ 9 ], we observe that genderize.io is almost as efficient as Gender API (errorCoded 1.8%) and NamSor (errorCoded 2.0%), the two gender detection tools that were shown to be the most powerful.…”

Section: Discussionmentioning

confidence: 60%

“…However, this is a multicultural and multilingual country, and nationalize.io showed multiple origins of the first names, even though almost half (i.e. 47%) were of French- or English-speaking origin [ 9 ]. Although the results of this study may be generalizable to most Western names, with other names, for example Asian or Middle Eastern, the effectiveness of the method used in the study is yet to be demonstrated.…”

Section: Discussionmentioning

confidence: 99%

“…For this study, we used the same database of physicians that we used in our earlier study [ 9 ]. This database consisted of 6,264 physicians, 50.4% of whom were women.…”

Section: Methodsmentioning

confidence: 99%

“…We recently showed that Gender API [ 4 ] and NamSor [ 8 ] are the most powerful tools for determining the gender of individuals [ 9 ]. By contrast, genderize.io [ 5 ] does not perform well due to a large number of unclassified first names.…”

Section: Introductionmentioning

confidence: 99%

“…However, genderize.io offers researchers a significant advantage over the other two gender detection tools in that it allows researchers to upload a file of 1,000 first names every day for free (i.e., to perform 30,000 queries per month), whereas Gender API is only free up to 500 requests per month and NamSor up to 5,000 requests per month. One way to improve the quality of inference by genderize.io is to use a second gender detection tool for unrecognized first names [ 9 ]. Although potentially more accurate, this strategy is relatively time consuming, as it requires creating a new file of these first names and then submitting it to the second gender detection tool.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference

Sebo¹

2021

jmla

Self Cite

View full text Add to dashboard Cite

Objective: We recently showed that genderize.io is not a sufficiently powerful gender detection tool due to a large number of nonclassifications. In the present study, we aimed to assess whether the accuracy of inference by genderize.io can be improved by manipulating the first names in the database.Methods: We used a database containing the first names, surnames, and gender of 6,131 physicians practicing in a multicultural country (Switzerland). We uploaded the original CSV file (file #1), the file obtained after removing all diacritic marks, such as accents and cedilla (file #2), and the file obtained after removing all diacritic marks and retaining only the first term of the compound first names (file #3). For each file, we computed three performance metrics: proportion of misclassifications (errorCodedWithoutNA), proportion of nonclassifications (naCoded), and proportion of misclassifications and nonclassifications (errorCoded).Results: naCoded, which was high for file #1 (16.4%), was reduced after data manipulation (file #2: 11.7%, file #3: 0.4%). As the increase in the number of misclassifications was small, the overall performance of genderize.io (i.e., errorCoded) improved, especially for file #3 (file #1: 17.7%, file #2: 13.0%, and file #3: 2.3%).Conclusions: A relatively simple manipulation of the data improved the accuracy of gender inference by genderize.io. We recommend using genderize.io only with files that were modified in this way.

show abstract

Section: Discussionmentioning

confidence: 60%

Section: Discussionmentioning

confidence: 99%

“…For this study, we used the same database of physicians that we used in our earlier study [ 9 ]. This database consisted of 6,264 physicians, 50.4% of whom were women.…”

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference

Sebo¹

2021

jmla

Self Cite

View full text Add to dashboard Cite

show abstract

Trends in research approaches and gender in plant ecology dissertations over four decades

Poddar,

Lam,

Gurevitch

2024

Ecology and Evolution

View full text Add to dashboard Cite

Dissertations are a foundational scientific product; they are the formative product that early‐career scientists create and share original knowledge. The methodological approaches used in dissertations vary with the research field. In plant ecology, these approaches include observations, experiments (field or controlled environment), literature reviews, theoretical approaches, or analyses of existing data (including “big data”). Recently, concerns have been raised about the rise of “big data” studies and the loss of observational and field‐based studies in ecology, but such trends have not been formally quantified. Therefore, we examined how the emphasis on each of these categories has changed over time and whether male and female authors differ in the methods employed. We found remarkable temporal consistency, with observational studies being dominant over the entire time span examined. There was an increase in the number of approaches employed per dissertation, with increases in analyses of databases and theoretical studies adding to rather than replacing traditional methodologies (like observations and field experiments). The representation of women increased over time. There were some differences in the approaches taken by men and women, which requires further investigation.

show abstract

Gender disparities in multiple myeloma publications

Dweik

Mian

et al. 2022

eJHaem

View full text Add to dashboard Cite

show abstract

Performance of gender detection tools: a comparative study of name-to-gender inference services

Cited by 114 publications

References 19 publications

Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference

Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference

Trends in research approaches and gender in plant ecology dissertations over four decades

Gender disparities in multiple myeloma publications

Contact Info

Product

Resources

About