2012
DOI: 10.1007/978-3-642-28641-4_13
|View full text |Cite
|
Sign up to set email alerts
|

Provable De-anonymization of Large Datasets with Sparse Dimensions

Abstract: There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-data about individuals, e.g., their preferences, movie ratings, or transaction data. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of the Narayanan-Shmatikov algorithm that was used to effectively de-anonymize the Netflix database of movie ratings. We prove theorems characterizing mathematical properties of the database and the auxiliary inform… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
74
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 29 publications
(74 citation statements)
references
References 15 publications
0
74
0
Order By: Relevance
“…Our results build on existing frameworks that exploit data sparsity [6,12,8,19]. However, we examine a new type of data and de-anonymization scenarios previously not studied.…”
Section: Related Workmentioning
confidence: 98%
See 1 more Smart Citation
“…Our results build on existing frameworks that exploit data sparsity [6,12,8,19]. However, we examine a new type of data and de-anonymization scenarios previously not studied.…”
Section: Related Workmentioning
confidence: 98%
“…Researchers have discovered that maintaining user anonymity within sparse datasets is surprisingly difficult [6,12,8,19]. Within this research community, the term sparsity refers to datasets in which an individual user or identity can be distinguished from others in a dataset by only a few select rarely occurring user attributes.…”
Section: Threat Modelmentioning
confidence: 99%
“…There are also more studies in the literature that researched deanonymisation [21][22][23][24][25][26][27] but these are not included in our survey as they are for analysis and evaluation purposes.…”
Section: Remarksmentioning
confidence: 99%
“…Second, we observe in this work that the primary reason for privacy leakage in auxiliary information attacks [11,26] is a centralized untrusted entity gathering detailed consumption about a pseudonymous end-user, even if the user's consumption is hidden within a group. A single untrusted aggregator possessing the scrambled consumption of all groups can mount attacks by first identifying the group that a user belongs to by exploiting auxiliary information, and can then infer sensitive attributes of the user if the group happens to predominantly revolve around sensitive topics (l-diversity [22] attacks).…”
Section: Introductionmentioning
confidence: 98%
“…This is because revealing multiple interests of a pseudonymous user is open to attacks based on auxiliary information, wherein an individual can be uniquely identified using a combination of few consumed items or interests (see Netflix attack [11,26]). …”
Section: Introductionmentioning
confidence: 99%