2023
DOI: 10.1038/s41562-023-01587-9
|View full text |Cite
|
Sign up to set email alerts
|

Name-based demographic inference and the unequal distribution of misrecognition

Abstract: Academics and companies increasingly draw on large datasets to understand the social world, and name-based demographic ascription tools are widespread for imputing information like gender and race that are often missing from these large datasets. These approaches have drawn criticism on ethical, empirical, and theoretical grounds. Employing a survey of all authors listed on articles in sociology, economics, and communications journals in the Web of Science between 2015 and 2020, we compared self-identified dem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
16
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 36 publications
(17 citation statements)
references
References 45 publications
1
16
0
Order By: Relevance
“…Because women of color have been historically excluded from academic positions, academia remains disproportionately white, and our analyses statistically reflect the experiences of white women. The accuracy of name-based algorithms for estimating race and/or ethnicity labels is uneven across groups ( 80 ), and the unreliability of these tools precludes estimating the attrition of women of color compared to white women, by field and institution. For similar reasons, we were unable to estimate the attrition of gender-diverse faculty, such as nonbinary faculty.…”
Section: Discussionmentioning
confidence: 99%
“…Because women of color have been historically excluded from academic positions, academia remains disproportionately white, and our analyses statistically reflect the experiences of white women. The accuracy of name-based algorithms for estimating race and/or ethnicity labels is uneven across groups ( 80 ), and the unreliability of these tools precludes estimating the attrition of women of color compared to white women, by field and institution. For similar reasons, we were unable to estimate the attrition of gender-diverse faculty, such as nonbinary faculty.…”
Section: Discussionmentioning
confidence: 99%
“…This type of classification is inherently biased unevenly across demographics and groups (please see Methods). 32 Using this methodology, we found that men make up the majority of K99 (n=2028, 58%) and R00 (n=1655, 58%) awardees. The same percentages of men and women K99 awardees convert their K99 awards to R00 awards (Table 2).…”
Section: Resultsmentioning
confidence: 94%
“…We scored gender based on personal communication with the author or an author's personal pronouns used in public venues such as blogs, websites, email signatures, and social media. Although time-and labor-intensive, this approach reduces error inherent in automated gender recognition algorithms (Lockhart et al 2023) and mitigates erasure of nonbinary and transgender people embedded in automated analysis of names (Keyes 2018). We could not verify gender for small proportion of authors (<0.1%); these were dropped.…”
Section: Methodsmentioning
confidence: 99%
“…We did not attempt to code other axes of intersectional diversity (e.g., race, ethnicity, citizenship, romantic or sexual orientation, age, or ability/disability status). Without self-reporting, accurately coding these identity markers would be impossible (see Lockhart et al 2023).…”
Section: Methodsmentioning
confidence: 99%