2018
DOI: 10.1515/popets-2018-0004
|View full text |Cite
|
Sign up to set email alerts
|

SafePub: A Truthful Data Anonymization Algorithm With Strong Privacy Guarantees

Abstract: Methods for privacy-preserving data publishing and analysis trade off privacy risks for individuals against the quality of output data. In this article, we present a data publishing algorithm that satisfies the differential privacy model. The transformations performed are truthful, which means that the algorithm does not perturb input data or generate synthetic output data. Instead, records are randomly drawn from the input dataset and the uniqueness of their features is reduced. This also offers an intuitive … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
37
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 45 publications
(39 citation statements)
references
References 43 publications
2
37
0
Order By: Relevance
“…Randomized, or stochastic, privacy-preserving policies have been shown to cause problems, such as un-truthfulness [33], which can be undesirable in practice [34]. This is perhaps one of the reason behind low popularity of randomized privacy-2 Popularity of these methods is somehwat evident from the sheer number of available toolboxes for implementation https://arx.deidentifier.org/overview/related-software/ preserving policies, such as differential privacy, within the financial or health sectors [33]. For instance, randomized privacy-preserving policies in financial auditing have been criticized for complicating fraud detection [35], [36].…”
Section: Introductionmentioning
confidence: 99%
“…Randomized, or stochastic, privacy-preserving policies have been shown to cause problems, such as un-truthfulness [33], which can be undesirable in practice [34]. This is perhaps one of the reason behind low popularity of randomized privacy-2 Popularity of these methods is somehwat evident from the sheer number of available toolboxes for implementation https://arx.deidentifier.org/overview/related-software/ preserving policies, such as differential privacy, within the financial or health sectors [33]. For instance, randomized privacy-preserving policies in financial auditing have been criticized for complicating fraud detection [35], [36].…”
Section: Introductionmentioning
confidence: 99%
“…The methods for creating privacypreserving prediction models presented in this article are compatible with all privacy models currently implemented by ARX (an overview is provided on the project website [22]). In this paper, we will use the following models to showcase our solution: (1) k-anonymity, which protects records from re-identification by requiring that each transformed record is indistinguishable from at least k − 1 other records regarding attributes that could be used in linkage attacks [15], (2) differential privacy which guarantees that the output of the anonymization procedure is basically independent of the contribution of individual records to the dataset, which protects output data from a wide range of risks [23,24], and (3) a gametheoretic model which employs an economic perspective on data re-identification attacks and assumes that adversaries will only attempt re-identification in case there is a tangible economic benefit [25,26].…”
Section: Privacy Modelsmentioning
confidence: 99%
“…Values of the attributes "age" and "sex" are transformed using level 2 and level 0, respectively, of their associated hierarchies overcome these limitations we had to rewrite major parts of the internals of the software and the resulting utility model is now the most complex model supported. Finally, we also had to develop and implement a specialized score function with proven mathematical properties to support differential privacy [24].…”
Section: Utility Modelsmentioning
confidence: 99%
“…Primarily, these models utilize the differential privacy guarantee of the Exponential mechanism [84], which defines a distribution to synthesize the data based on the input database and the pre-defined quality function. Various methods -both application-specific and application-independent -have been proposed [10,16,31,52,56,57,74,77,83,84,89,98,124,127]. Our approach contrasts these works in two ways.…”
Section: Non-parametric Generative Modelsmentioning
confidence: 99%