Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media 2021
DOI: 10.18653/v1/2021.socialnlp-1.11
|View full text |Cite
|
Sign up to set email alerts
|

Using Noisy Self-Reports to Predict Twitter User Demographics

Abstract: Computational social science studies often contextualize content analysis within standard demographics.Since demographics are unavailable on many social media platforms (e.g. Twitter), numerous studies have inferred demographics automatically. Despite many studies presenting proof-of-concept inference of race and ethnicity, training of practical systems remains elusive since there are few annotated datasets. Existing datasets are small, inaccurate, or fail to cover the four most common racial and ethnic groups… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 15 publications
(28 citation statements)
references
References 60 publications
0
28
0
Order By: Relevance
“…However, many individuals do not fit in these gender categories, some present a gender online inconsistent to their true identity (Nilizadeh et al, 2016), and they often experience depression and other mental health conditions at a higher rate (McDonald, 2018). For race/ethnicity labels, we consider the mutually-exclusive labeling conventions invoked by Brody et al (2018) and Wood-Doughty et al (2020): non-Hispanic White, non-Hispanic Black, non-Hispanic Asian and Hispanic/Latinx, as they are representative of the majority of racial and ethnic identities in the US. Our racial/ethnic categories do not capture multiracial individuals or those with a race/ethnicity outside this group.…”
Section: Ethical Considerationsmentioning
confidence: 99%
See 4 more Smart Citations
“…However, many individuals do not fit in these gender categories, some present a gender online inconsistent to their true identity (Nilizadeh et al, 2016), and they often experience depression and other mental health conditions at a higher rate (McDonald, 2018). For race/ethnicity labels, we consider the mutually-exclusive labeling conventions invoked by Brody et al (2018) and Wood-Doughty et al (2020): non-Hispanic White, non-Hispanic Black, non-Hispanic Asian and Hispanic/Latinx, as they are representative of the majority of racial and ethnic identities in the US. Our racial/ethnic categories do not capture multiracial individuals or those with a race/ethnicity outside this group.…”
Section: Ethical Considerationsmentioning
confidence: 99%
“…Their classifier achieves an accuracy of 82.3% within intrinsic evaluations and shows even more promise as highconfidence thresholds are applied. To validate and further reduce noise in previously-inferred gender attributes, we train a new gender inference model on data from Burger et al (2011) using the same architecture of Wood-Doughty et al (2020). Our classifier obtains an accuracy of 83.3% amongst within-distribution data and outputs a distribution of inferred gender attributes that strongly aligns with that of the original datasets.…”
Section: Multitaskmentioning
confidence: 99%
See 3 more Smart Citations