Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.334
|View full text |Cite
|
Sign up to set email alerts
|

Compositional Demographic Word Embeddings

Abstract: Word embeddings are usually derived from corpora containing text from many individuals, thus leading to general purpose representations rather than individually personalized representations. While personalized embeddings can be useful to improve language model performance and other language processing tasks, they can only be computed for people with a large amount of longitudinal data, which is not the case for new users. We propose a new form of personalized word embeddings that use demographic-specific word … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

3
6

Authors

Journals

citations
Cited by 18 publications
(12 citation statements)
references
References 38 publications
0
10
0
Order By: Relevance
“…It may well be the case in the UK that word choice varies more between areas and genders than age cohorts, for example, a reviewer who cites the product's "lush vanilla taste" may reside in the West of England, while calling a bad service "shite" may indicate they are Scottish. This is an interesting counterfinding to Welch et al (2020) which found better embedding performance with age-and genderaware representations in a global population. Differing privacy requirements for separate attributes are a feature of multiple variations on differential privacy regimes (Kamalaruban et al, 2020;Alaggan et al, 2017;Jorgensen et al, 2015).…”
Section: Discussionmentioning
confidence: 73%
“…It may well be the case in the UK that word choice varies more between areas and genders than age cohorts, for example, a reviewer who cites the product's "lush vanilla taste" may reside in the West of England, while calling a bad service "shite" may indicate they are Scottish. This is an interesting counterfinding to Welch et al (2020) which found better embedding performance with age-and genderaware representations in a global population. Differing privacy requirements for separate attributes are a feature of multiple variations on differential privacy regimes (Kamalaruban et al, 2020;Alaggan et al, 2017;Jorgensen et al, 2015).…”
Section: Discussionmentioning
confidence: 73%
“…We searched for English language Reddit posts with the phrase "I'm <age>" that were posted between December 1 and December 31, 2019. In the literature, similar approaches have been used to construct large Reddit data sets containing self-reported demographics [33][34][35]. We were interested in youth (13-17 years), young adult (18-20 years), and adult (21-54 years) age segments.…”
Section: Identifying Self-reporting Of Age On Redditmentioning
confidence: 99%
“…Reddit users have been found to be primarily male, young adults (under 30), located in the USA and primarily identify as christian or atheist (Welch et al, 2020). It is possible that results presented in this paper do not generalize to populations that differ significantly from the population of Reddit users.…”
Section: Personalized Word Embeddingsmentioning
confidence: 76%