2018
DOI: 10.1038/sdata.2018.25
|View full text |Cite
|
Sign up to set email alerts
|

Demographic aspects of first names

Abstract: We introduce a list that offers information on the relation between first names and race or ethnicity. Drawing information from mortgage applications, the list includes 4,250 first names and information on their respective count and proportions across six mutually exclusive racial and Hispanic origin groups. These six categories are consistent with the categories used in the Census Bureau's list on surnames' demographic information. Also, just like the Census Bureau's list of surnames, the list of first names … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
73
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 70 publications
(73 citation statements)
references
References 7 publications
0
73
0
Order By: Relevance
“…In other countries, surname lists have been utilized to identify religious groups and those of Middle Eastern descent (Mateos, 2007). Others have created first-name based lists to identify R/E using first name and surname, finding that first names might more accurately identify White individuals than surnames (Tzioumis, 2018). As reference name lists continue to be created, Mateos (2007) cautions researchers to make sure that these lists were based on a large enough population to make valid inferences and to be aware of temporal differences, regional differences, differences in average ratio of people per surname, history of name adoption, and surnames reflecting only patrilineal descent.…”
Section: Resultsmentioning
confidence: 99%
“…In other countries, surname lists have been utilized to identify religious groups and those of Middle Eastern descent (Mateos, 2007). Others have created first-name based lists to identify R/E using first name and surname, finding that first names might more accurately identify White individuals than surnames (Tzioumis, 2018). As reference name lists continue to be created, Mateos (2007) cautions researchers to make sure that these lists were based on a large enough population to make valid inferences and to be aware of temporal differences, regional differences, differences in average ratio of people per surname, history of name adoption, and surnames reflecting only patrilineal descent.…”
Section: Resultsmentioning
confidence: 99%
“…All names appear in each source, except Maile does not appear for the last source. 23 For example, there is no Census or Social Security Administration tabulation of first names by race as there is for last names (Tzioumis 2018) and there is little information that suggests that Native American or Alaska Native first names are sufficiently common. Furthermore, no Alaska Native-specific names appear in the Social Security database in Alaska for the years 1985-1987.…”
Section: First Name As An Indigenous Signal (Native Hawaiian Only)mentioning
confidence: 99%
“…Because the data points do not have names associated with them, we generate synthetic first names using the race and gender attributes. First, we use the dataset of Tzioumis (2018) to identify "white" and "nonwhite" names. For each name, if the proportion of "white" people with that name is higher than 0.5, we deem the name to be "white;" otherwise, we deem it to be "non-white."…”
Section: Datasetsmentioning
confidence: 99%
“…And, because biographies are typically written in the third person and because pronouns are gendered in English, we can extract (likely) self-identified gender. We infer race for each data point by sampling from a Bernoulli distribution with probability equal to the average of the probability that an individual with that first name is "white" (from the dataset of Tzioumis (2018), using a threshold of 0.5, as described above) and the probability that an individual with that last name is "white" (from the dataset of Comenetz (2016), also using a threshold of 0.5). 3 Finally, like , we consider two versions of the Bios dataset: one where first names and pronouns are available to the classifier and one where they are "scrubbed.…”
Section: Datasetsmentioning
confidence: 99%