Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2014
DOI: 10.3115/v1/d14-1121
|View full text |Cite
|
Sign up to set email alerts
|

Developing Age and Gender Predictive Lexica over Social Media

Abstract: Demographic lexica have potential for widespread use in social science, economic, and business applications. We derive predictive lexica (words and weights) for age and gender using regression and classification models from word usage in Facebook, blog, and Twitter data with associated demographic labels. The lexica, made publicly available, 1 achieved state-of-the-art accuracy in language based age and gender prediction over Facebook and Twitter, and were evaluated for generalization across social media genre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
207
0
1

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 191 publications
(212 citation statements)
references
References 27 publications
4
207
0
1
Order By: Relevance
“…Their results were also consistent with Psyzcynski and Greenberg's theory of depression, in that texters with a smaller amount of self-focus were associated with more successful conversations. In addition, Schwartz et al (2014) showed that regression models based on Facebook language can be used to predict an individuals degree of depression.…”
Section: Linguistic and Social Indicatorsmentioning
confidence: 99%
See 1 more Smart Citation
“…Their results were also consistent with Psyzcynski and Greenberg's theory of depression, in that texters with a smaller amount of self-focus were associated with more successful conversations. In addition, Schwartz et al (2014) showed that regression models based on Facebook language can be used to predict an individuals degree of depression.…”
Section: Linguistic and Social Indicatorsmentioning
confidence: 99%
“…However, for at least some participants, there was skepticism related to how well social media represents the mental health of users. (Schwartz et al, 2014), and using unsupervised Machine Learning techniques to explore depression-related language on Twitter (Resnik et al, 2015).…”
Section: Introductionmentioning
confidence: 99%
“…We used the demographic classification tool from the World Well-Being Project (Sap et al, 2014) 2 . For each depression and PTSD user we estimated their gender, forcing the classifier to make a binary decision as to whether the user was 'Female' or 'Male', and used the age estimate as-is (an ostensibly continuous variable).…”
Section: Age-and Gender-matched Controlsmentioning
confidence: 99%
“…In order to estimate which tweets are more likely to be written by females or a older user, we use the classifier introduced in (Sap et al, 2014). This is a regularized Linear SVM that obtains state-of-theart prediction results on user gender (91.9% accuracy) and age (r = .835) prediction from social media text.…”
Section: Methodsmentioning
confidence: 99%