Towards A Differential Privacy and Utility Preserving Machine Learning Classifier

Mivule, Kato; Turner, Claude; Ji, Soo-Yeon

doi:10.1016/j.procs.2012.09.050

Cited by 37 publications

(29 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This method does not need to carry out discrete preprocessing of data. Literature [20], Mivule et al proposed a framework that uses AdaBoost iterations to update data sets until the forest achieves an acceptable level of prediction accuracy. However, the framework lacks detail in that it does not give the specific content of the differential privacy technology, or how it allocates the privacy budget.…”

Section: Related Workmentioning

confidence: 99%

Research on an Ensemble Classification Algorithm Based on Differential Privacy

Jia

Qiu

2020

IEEE Access

View full text Add to dashboard Cite

In the field of information security, privacy protection based on machine learning is currently a hot topic. Combining differential privacy protection with AdaBoost, a machine learning ensemble classification algorithm, this paper proposes a scheme under differential privacy named CART-DPsAdaBoost (CART-Differential privacy structure of AdaBoost). In the process of boosting, the algorithm combines the idea of bagging, and uses a classification and regression tree (CART) stump as the base learner for ensemble learning. Applying feature perturbation, based on a random subspace algorithm, the exponential mechanism is used to select the splitting point for continuous attributes. We use the Gini index to find the optimal binary partitioning point for discrete attributes and add noise according to the Laplace mechanism. Throughout the process, a privacy budget is allocated in order to meet the appropriate differential privacy protection needs for the current application. Unlike similar algorithms, this method does not require discretization during preprocessing of the data. Experimental results with the Census Income, Digit Recognizer, and Adult Data Set show that while protecting private information, the scheme has little impact on classification accuracy and can effectively address large-scale and high-dimensional data classification problems.

show abstract

Section: Related Workmentioning

confidence: 99%

Research on an Ensemble Classification Algorithm Based on Differential Privacy

Jia

Qiu

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Therefore, in this study, we focused on using input perturbation to perform differentially private classification task. To reach our goal, we adopted input perturbation technique of differential privacy as used in the studies of (Mivule et al, 2012; Sánchez et al, 2016; Sarwate & Chaudhuri, 2013; Senekane, 2019) to perform privacy preserving classification. We experimentally analyzed the performances of the well‐known classification algorithms that are C4.5, Naïve Bayes, Bayesian Networks, IBk, K*, One Rule, PART, Random tree, and Ripper for classification of the differentially private data, obtained by applying input perturbation to 8 widely used UCI datasets for various privacy levels by changing the ɛ values from 1 to 5, and also for small ɛ values (i.e., ɛ <1).…”

Section: Differentially Private Classificationmentioning

confidence: 99%

“…Input perturbation : In this technique, the data is perturbed by adding noise to the values of its numerical attributes for a certain privacy level (i.e., ɛ ).Definition Let x be a d ‐dimensional vector of a data instance in a database D , differentially private version of x can be given as in Equation (5).

x_{priv} = x + z

where z is a d ‐dimensional vector with a Laplace density probability function given in Equation ). With this noise addition to each individual data vector x i in the database D , it can be guaranteed that the resulting database D priv = ( x priv 1 , x priv 2 , x priv 3 , …, x privn ) is an ɛ ‐differentially private approximation to D (Antonova, 2016; Dwork et al, 2006; Mivule et al, 2012; Sarwate & Chaudhuri, 2013; Senekane, 2019).…”

Section: Definitions and Formulations Of Differential Privacymentioning

confidence: 99%

“…It is the easiest way of providing differential privacy, and the significant advantage of the input perturbation is that, it enables to release the noisy dataset while maintaining the privacy. In other words, input perturbation is independent of any data mining algorithm unlike objective perturbation and output perturbation techniques, and researchers can utilize this private data to run their own functions on it (Antonova, 2016; Ji et al, 2014; Mivule et al, 2012; Sarwate & Chaudhuri, 2013).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Privacy preserving classification over differentially private data

Zorarpacı

Özel

2020

WIREs Data Min & Knowl

View full text Add to dashboard Cite

Privacy preserving data classification is an important research area in data mining field. The goal of a privacy preserving classification algorithm is to protect the sensitive information as much as possible, while providing satisfactory classification accuracy. Differential privacy is a strong privacy guarantee that enables privacy of sensitive data stored in a database by determining the ratio of sensitive information leakage with respect to an ɛ parameter. In this study, our aim is to investigate the classification performance of the state‐of‐the‐art classification algorithms such as C4.5, Naïve Bayes, One Rule, Bayesian Networks, PART, Ripper, K*, IBk, and Random tree for performing privacy preserving classification. To preserve privacy of the data to be classified, we applied input perturbation technique coming from differential privacy, and observed the relationship between the ɛ parameter values and accuracy of the classifiers. To our best knowledge, this article is the first study that analyzes the performances of the well‐known classification algorithms over differentially private data, and discovers which datasets are more suitable for privacy preserving classification when input perturbation is applied to provide data privacy. The classification algorithms are compared by using the differentially private versions of the well‐known datasets from the UCI repository. According to the experimental results, we observed that, as ɛ parameter value increases, better classification accuracies are achieved with lower privacy levels. When the classifiers are compared, Naïve Bayes classifier is the most successful method. The ɛ parameter should be greater than or equal to 2 (i.e., ɛ ≥2) to achieve cloud server is malicious and untrusted, sensitive data will satisfactory classification accuracies. This article is categorized under: Commercial, Legal, and Ethical Issues > Security and Privacy Technologies > Classification

show abstract

“…Many research have been done by scholars from home and abroad. Documentary [ 4 ] proposed the issue of balance between availability and privacy of differential privacy protection. As differential privacy protection is a data distortion technique, the balance between availability and privacy of differential privacy protection is an NP problem.…”

Section: Introductionmentioning

confidence: 99%

A differential privacy protecting K-means clustering algorithm based on contour coefficients

2018

View full text Add to dashboard Cite

This paper, based on differential privacy protecting K-means clustering algorithm, realizes privacy protection by adding data-disturbing Laplace noise to cluster center point. In order to solve the problem of Laplace noise randomness which causes the center point to deviate, especially when poor availability of clustering results appears because of small privacy budget parameters, an improved differential privacy protecting K-means clustering algorithm was raised in this paper. The improved algorithm uses the contour coefficients to quantitatively evaluate the clustering effect of each iteration and add different noise to different clusters. In order to be adapted to the huge number of data, this paper provides an algorithm design in MapReduce Framework. Experimental finding shows that the new algorithm improves the availability of the algorithm clustering results under the condition of ensuring individual privacy without significantly increasing its operating time.

show abstract

Towards A Differential Privacy and Utility Preserving Machine Learning Classifier

Cited by 37 publications

References 18 publications

Research on an Ensemble Classification Algorithm Based on Differential Privacy

Research on an Ensemble Classification Algorithm Based on Differential Privacy

Privacy preserving classification over differentially private data

A differential privacy protecting K-means clustering algorithm based on contour coefficients

Contact Info

Product

Resources

About