SafePub: A Truthful Data Anonymization Algorithm With Strong Privacy Guarantees

Bild, Raffael; Kuhn, Klaus A.; Praßer, Fabian

doi:10.1515/popets-2018-0004

Cited by 45 publications

(39 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Randomized, or stochastic, privacy-preserving policies have been shown to cause problems, such as un-truthfulness [33], which can be undesirable in practice [34]. This is perhaps one of the reason behind low popularity of randomized privacy-2 Popularity of these methods is somehwat evident from the sheer number of available toolboxes for implementation https://arx.deidentifier.org/overview/related-software/ preserving policies, such as differential privacy, within the financial or health sectors [33]. For instance, randomized privacy-preserving policies in financial auditing have been criticized for complicating fraud detection [35], [36].…”

Section: Introductionmentioning

confidence: 99%

Development and Analysis of Deterministic Privacy-Preserving Policies Using Non- Stochastic Information Theory

Farokhi

2019

IEEE Trans.Inform.Forensic Secur.

View full text Add to dashboard Cite

A deterministic privacy metric using non-stochastic information theory is developed. Particularly, maximin information is used to construct a measure of information leakage, which is inversely proportional to the measure of privacy. Anyone can submit a query to a trusted agent with access to a non-stochastic uncertain private dataset. Optimal deterministic privacy-preserving policies for responding to the submitted query are computed by maximizing the measure of privacy subject to a constraint on the worst-case quality of the response (i.e., the worst-case difference between the response by the agent and the output of the query computed on the private dataset). The optimal privacy-preserving policy is proved to be a piecewise constant function in the form of a quantization operator applied on the output of the submitted query. The measure of privacy is also used to analyze k-anonymity (a popular deterministic mechanism for privacy-preserving release of datasets using suppression and generalization techniques), proving that it is in fact not privacypreserving.

show abstract

Section: Introductionmentioning

confidence: 99%

Development and Analysis of Deterministic Privacy-Preserving Policies Using Non- Stochastic Information Theory

Farokhi

2019

IEEE Trans.Inform.Forensic Secur.

View full text Add to dashboard Cite

show abstract

“…The methods for creating privacypreserving prediction models presented in this article are compatible with all privacy models currently implemented by ARX (an overview is provided on the project website [22]). In this paper, we will use the following models to showcase our solution: (1) k-anonymity, which protects records from re-identification by requiring that each transformed record is indistinguishable from at least k − 1 other records regarding attributes that could be used in linkage attacks [15], (2) differential privacy which guarantees that the output of the anonymization procedure is basically independent of the contribution of individual records to the dataset, which protects output data from a wide range of risks [23,24], and (3) a gametheoretic model which employs an economic perspective on data re-identification attacks and assumes that adversaries will only attempt re-identification in case there is a tangible economic benefit [25,26].…”

Section: Privacy Modelsmentioning

confidence: 99%

“…Values of the attributes "age" and "sex" are transformed using level 2 and level 0, respectively, of their associated hierarchies overcome these limitations we had to rewrite major parts of the internals of the software and the resulting utility model is now the most complex model supported. Finally, we also had to develop and implement a specialized score function with proven mathematical properties to support differential privacy [24].…”

Section: Utility Modelsmentioning

confidence: 99%

A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models

Eicher

Bild

Spengler

et al. 2020

BMC Med Inform Decis Mak

Self Cite

View full text Add to dashboard Cite

Background: Modern data driven medical research promises to provide new insights into the development and course of disease and to enable novel methods of clinical decision support. To realize this, machine learning models can be trained to make predictions from clinical, paraclinical and biomolecular data. In this process, privacy protection and regulatory requirements need careful consideration, as the resulting models may leak sensitive personal information. To counter this threat, a wide range of methods for integrating machine learning with formal methods of privacy protection have been proposed. However, there is a significant lack of practical tools to create and evaluate such privacy-preserving models. In this software article, we report on our ongoing efforts to bridge this gap. Results: We have extended the well-known ARX anonymization tool for biomedical data with machine learning techniques to support the creation of privacy-preserving prediction models. Our methods are particularly well suited for applications in biomedicine, as they preserve the truthfulness of data (e.g. no noise is added) and they are intuitive and relatively easy to explain to non-experts. Moreover, our implementation is highly versatile, as it supports binomial and multinomial target variables, different types of prediction models and a wide range of privacy protection techniques. All methods have been integrated into a sound framework that supports the creation, evaluation and refinement of models through intuitive graphical user interfaces. To demonstrate the broad applicability of our solution, we present three case studies in which we created and evaluated different types of privacy-preserving prediction models for breast cancer diagnosis, diagnosis of acute inflammation of the urinary system and prediction of the contraceptive method used by women. In this process, we also used a wide range of different privacy models (k-anonymity, differential privacy and a game-theoretic approach) as well as different data transformation techniques. Conclusions: With the tool presented in this article, accurate prediction models can be created that preserve the privacy of individuals represented in the training set in a variety of threat scenarios. Our implementation is available as open source software.

show abstract

“…Primarily, these models utilize the differential privacy guarantee of the Exponential mechanism [84], which defines a distribution to synthesize the data based on the input database and the pre-defined quality function. Various methods -both application-specific and application-independent -have been proposed [10,16,31,52,56,57,74,77,83,84,89,98,124,127]. Our approach contrasts these works in two ways.…”

Section: Non-parametric Generative Modelsmentioning

confidence: 99%

RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

Chanyaswad

Liu

Mittal

2018

Proceedings on Privacy Enhancing Technologies

View full text Add to dashboard Cite

A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong -differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications -clustering, classification, and regression -on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for realworld deployment of privacy-preserving data release.

show abstract

SafePub: A Truthful Data Anonymization Algorithm With Strong Privacy Guarantees

Cited by 45 publications

References 43 publications

Development and Analysis of Deterministic Privacy-Preserving Policies Using Non- Stochastic Information Theory

Development and Analysis of Deterministic Privacy-Preserving Policies Using Non- Stochastic Information Theory

A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models

RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

Contact Info

Product

Resources

About