Proceedings of the First Workshop on NLP and Computational Social Science 2016
DOI: 10.18653/v1/w16-5608
|View full text |Cite
|
Sign up to set email alerts
|

#WhoAmI in 160 Characters? Classifying Social Identities Based on Twitter Profile Descriptions

Abstract: We combine social theory and NLP methods to classify English-speaking Twitter users' online social identity in profile descriptions. We conduct two text classification experiments. In Experiment 1 we use a 5-category online social identity classification based on identity and self-categorization theories. While we are able to automatically classify two identity categories (Relational and Occupational), automatic classification of the other three identities (Political, Ethnic/religious and Stigmatized) is chall… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
22
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(24 citation statements)
references
References 36 publications
2
22
0
Order By: Relevance
“…The accuracy of our keyword-based user classification scheme was determined by manually checking the profiles of a random subsample (∼10%) of users, from which the assigned classification was either verified or not. In our subsample, the classification scheme returned a precision rate of 85% -a success rate in line or better than similar studies that conducted user classification analysis on Twitter (Wagner et al, 2012;Barthel et al, 2015;Priante et al, 2016;Côté and Darling, 2018b;Haustein, 2018). The simple dichotomous categorization scheme greatly reduces the risk of misclassification, however, the algorithm is not without limitations.…”
Section: Twitter Analytics Data and User Classificationsupporting
confidence: 63%
See 1 more Smart Citation
“…The accuracy of our keyword-based user classification scheme was determined by manually checking the profiles of a random subsample (∼10%) of users, from which the assigned classification was either verified or not. In our subsample, the classification scheme returned a precision rate of 85% -a success rate in line or better than similar studies that conducted user classification analysis on Twitter (Wagner et al, 2012;Barthel et al, 2015;Priante et al, 2016;Côté and Darling, 2018b;Haustein, 2018). The simple dichotomous categorization scheme greatly reduces the risk of misclassification, however, the algorithm is not without limitations.…”
Section: Twitter Analytics Data and User Classificationsupporting
confidence: 63%
“…Biographical user-related data provides one of the most accurate depictions of a user's true identity (Wagner et al, 2012), and was therefore the primary basis for our classification scheme. Rooted in previous methodologies (Barthel et al, 2015;Priante et al, 2016;Côté and Darling, 2018a,b), we built a pre-defined keyword and expression string list using words that related to our INREACH classification group, and were therefore likely to be mentioned in a user's biographical data ( Table 1). The "stringr" package (Wickham, 2017) in R was used to filter user profiles that contained relevant keywords and expressions.…”
Section: Twitter Analytics Data and User Classificationmentioning
confidence: 99%
“…Re-1. http://www.bbc.co.uk/programmes/b04p59vr searchers have attempted to infer attributes of Twitter users such as age [23], [46], gender [6], [31], [35], [46], political orientation [11], [12], [40], [41] or a range of social identities [44]. Digging more deeply into the demographics of Twitter users, other researchers have attempted to infer socioeconomic demographics such as occupational class [42], income [43] and socioeconomic status [28].…”
Section: Related Workmentioning
confidence: 99%
“…Each identity term in the lexicon was further subdivided into four classes: political identities (e.g., senator, president), gendered identities (e.g., women, transgender), racial/ nationality identities (e.g., Black, Filipino), and religious identities (e.g., priest, imam). Subclasses were based on related work on hate speech and social media identity prediction more generally [46,65]. Acknowledging that identities are intersectional [28], this subcategorization scheme was set up such that each identity term could be assigned any combination of the four identities mentioned above.…”
Section: Identity Analysismentioning
confidence: 99%