Abstract-In this paper, we study the problem of constructing private classifiers using decision trees, within the framework of differential privacy. We first construct privacy-preserving ID3 decision trees using differentially private sum queries. Our experiments show that for many data sets a reasonable privacy guarantee can only be obtained via this method at a steep cost of accuracy in predictions.We then present a differentially private decision tree ensemble algorithm using the random decision tree approach. We demonstrate experimentally that our approach yields good prediction accuracy even when the size of the datasets is small. We also present a differentially private algorithm for the situation in which new data is periodically appended to an existing database. Our experiments show that our differentially private random decision tree classifier handles data updates in a way that maintains the same level of privacy guarantee.
We present a simple I/O-efficient k-clustering algorithm that was designed with the goal of enabling a privacy-preserving version of the algorithm. Our experiments show that this algorithm produces cluster centers that are, on average, more accurate than the ones produced by the well known iterative k-means algorithm. We use our new algorithm as the basis for a communication-efficient privacy-preserving k-clustering protocol for databases that are horizontally partitioned between two parties. Unlike existing privacy-preserving protocols based on the k-means algorithm, this protocol does not reveal intermediate candidate cluster centers.
We investigate the query complexity of exact learning in the membership and (proper) equivalence query model. We give a complete characterization of concept classes that are learnable with a polynomial number of polynomial sized queries in this model. We give applications of this characterization, including results on learning a natural subclass of DNF formulas, and on learning with membership queries alone. Query complexity has previously been used to prove lower bounds on the time complexity of exact learning. We show a new relationship between query complexity and time complexity in exact learning: If any “honest” class is exactly and properly learnable with polynomial query complexity, but not learnable in polynomial time, then P = NP. In particular, we show that an honest class is exactly polynomial-query learnable if and only if it is learnable using an oracle for Γ p 4 .
Motivated by the semi-supervised model in the data mining literature, we propose a model for differentiallyprivate learning in which private data is augmented by public data to achieve better accuracy. Our main result is a differentially private classifier with significantly improved accuracy compared to previous work. We experimentally demonstrate that such a classifier produces good prediction accuracies even in those situations where the amount of private data is fairly limited. This expands the range of useful applications of differential privacy since typical results in the differential privacy model require large private data sets to obtain good accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.