Proceedings of the ACM Web Conference 2022 2022
DOI: 10.1145/3485447.3512232
|View full text |Cite
|
Sign up to set email alerts
|

DP-VAE: Human-Readable Text Anonymization for Online Reviews with Differentially Private Variational Autoencoders

Abstract: While vast amounts of personal data are shared daily on public online platforms and used by companies and analysts to gain valuable insights, privacy concerns are also on the rise: Modern authorship attribution techniques have proven effective at identifying individuals from their data, such as their writing style or behavior of picking and judging movies. It is hence crucial to develop data sanitization methods that allow sharing of users' data while protecting their privacy and preserving quality and content… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 47 publications
0
3
0
Order By: Relevance
“…Additionally, since multiple features can be created for each of the 𝑘 occurrences of a term, we define two data sets as being adjacent if they differ by all 𝐾 features associated with a given term, |𝑥| − |𝑥 ′ | = 𝐾, 𝐾 ≥ 𝑘. Three previous works have explored providing actual user-level differential privacy against text-based linkage attacks 24,25,26 , also known as authorship attribution attacks. At first glance this may sound similar to our work here, however there is key difference: these works focus on protecting the privacy of the people providing the text, rather than protecting the confidentiality of the text itself.…”
Section: User-level Privacy As Term-level Privacymentioning
confidence: 99%
“…Additionally, since multiple features can be created for each of the 𝑘 occurrences of a term, we define two data sets as being adjacent if they differ by all 𝐾 features associated with a given term, |𝑥| − |𝑥 ′ | = 𝐾, 𝐾 ≥ 𝑘. Three previous works have explored providing actual user-level differential privacy against text-based linkage attacks 24,25,26 , also known as authorship attribution attacks. At first glance this may sound similar to our work here, however there is key difference: these works focus on protecting the privacy of the people providing the text, rather than protecting the confidentiality of the text itself.…”
Section: User-level Privacy As Term-level Privacymentioning
confidence: 99%
“…To generate human-readable text, Bo et al (2021) employ an encoder-decoder model similar to ours, but without paraphrasing, and sample output words using (a two-set variant of) the Exponential mechanism (McSherry and Talwar, 2007). Weggenmann et al (2022) propose a differentially private variation of the variational autoencoder and use it as a sequenceto-sequence architecture for text anonymization.…”
Section: Related Workmentioning
confidence: 99%
“…We furthermore use a domain independent LDP mechanism specifically for VAE, to which we refer as VAE-LDP. VAE-LDP by Weggenmann et al [37] allows a data scientist to use VAE as LDP mechanism to perturb data. This is achieved by limiting the encoders mean and adding noise to the encoders standard deviation before sampling the latent code z during training.…”
Section: Differential Privacymentioning
confidence: 99%