Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2018
DOI: 10.18653/v1/p18-2005
|View full text |Cite
|
Sign up to set email alerts
|

Towards Robust and Privacy-preserving Text Representations

Abstract: Written text often provides sufficient clues to identify the author, their gender, age, and other important attributes. Consequently, the authorship of training and evaluation corpora can have unforeseen impacts, including differing model performance for different user groups, as well as privacy implications. In this paper, we propose an approach to explicitly obscure important author characteristics at training time, such that representations learned are invariant to these attributes. Evaluating on two tasks,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
121
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 118 publications
(127 citation statements)
references
References 17 publications
1
121
0
Order By: Relevance
“…Others have attempted to remove biases from learned representations, e.g., gender biases in word embeddings (Bolukbasi et al, 2016) or sensitive information like sex and age in text representations (Li et al, 2018). However, removing such attributes from text representations may be difficult (Elazar & Goldberg, 2018).…”
Section: Fine-tuning On Target Datasetsmentioning
confidence: 99%
“…Others have attempted to remove biases from learned representations, e.g., gender biases in word embeddings (Bolukbasi et al, 2016) or sensitive information like sex and age in text representations (Li et al, 2018). However, removing such attributes from text representations may be difficult (Elazar & Goldberg, 2018).…”
Section: Fine-tuning On Target Datasetsmentioning
confidence: 99%
“…Users can also provide some demographic information.In the collected dataset, each review is associated with three attributes, gender (male/female), age, and location (Denmark, France, United Kingdom, and United States). We follow the same approach as in [39] and discard all non-English reviews based on LANGID.PY 5 [40], and only keep reviews classified as English with a confidence greater than 0.9. We follow the setting of [32] and categorize age attribute into three groups, over- 45, under-35, and between 35 and 45.…”
Section: Datamentioning
confidence: 99%
“…This data is obtained from Hovy et al [32] and consists of 600 sentences, each tagged with POS information based on the Google Universal POS tagset [45] and also labeled with both gender and age of the users. The gender attribute is categorized into male and female, and age attribute is categorized into two groups over- 45, under-35. We follow the setting of [39] and use Web English Tree-bank (WebEng) [16] as a pre-training tagging model because of the small quantity of text available for this task. WebEng is similar to TrustPilot datasets w.r.t.…”
Section: Task 2: Part-of-speech (Pos) Taggingmentioning
confidence: 99%
See 2 more Smart Citations