2019
DOI: 10.31234/osf.io/zkunh
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Predicting a Salient Social Identity from Linguistic Style

Abstract: Social media data are already being used to classify individuals into mutually exclusive social groups. Here we propose a model based on self-categorization theory that classifies which of two social identities is salient within the same person using text data. Based on over 500,000 online forum posts and seven prototype-based style features, a trained classifier correctly distinguishes between posts written by the same person in two different social contexts – a parenting forum and a feminist forum – signific… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
6
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(7 citation statements)
references
References 40 publications
(54 reference statements)
1
6
0
Order By: Relevance
“…Further, Chung and Pennebaker (2019), the developers of LIWC, advise using a minimum cut-off of 100 words where possible. In practice, word count cut-offs are often lower than this, especially when using sparser social media data (50 words – Bäck et al, 2018; 25 words – Koschate et al, 2019; 45 words – Nelson et al, 2017; 50 words – Petrie et al, 2008; 50 words – Wilson, 2019). Our choice of 50 words was made in order to keep as much data as possible in our analysis, whilst ensuring that the data could be used to draw psycho-logically meaningful conclusions (Pennebaker Conglomerates Inc., 2017).…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Further, Chung and Pennebaker (2019), the developers of LIWC, advise using a minimum cut-off of 100 words where possible. In practice, word count cut-offs are often lower than this, especially when using sparser social media data (50 words – Bäck et al, 2018; 25 words – Koschate et al, 2019; 45 words – Nelson et al, 2017; 50 words – Petrie et al, 2008; 50 words – Wilson, 2019). Our choice of 50 words was made in order to keep as much data as possible in our analysis, whilst ensuring that the data could be used to draw psycho-logically meaningful conclusions (Pennebaker Conglomerates Inc., 2017).…”
Section: Methodsmentioning
confidence: 99%
“…We first collected data from individuals writing with one of our two identities salient. In line with Koschate et al (2019), we used forum topic as a proxy for identity salience. We assumed that individuals posting in either an entrepreneur forum or a libertarian forum had the respective identities salient at the time of writing.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations