Proceedings of the 2006 SIAM International Conference on Data Mining 2006
DOI: 10.1137/1.9781611972764.47
|View full text |Cite
|
Sign up to set email alerts
|

A New Privacy-Preserving Distributed k-Clustering Algorithm

Abstract: We present a simple I/O-efficient k-clustering algorithm that was designed with the goal of enabling a privacy-preserving version of the algorithm. Our experiments show that this algorithm produces cluster centers that are, on average, more accurate than the ones produced by the well known iterative k-means algorithm. We use our new algorithm as the basis for a communication-efficient privacy-preserving k-clustering protocol for databases that are horizontally partitioned between two parties. Unlike existing p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
55
0

Year Published

2008
2008
2021
2021

Publication Types

Select...
6
2
2

Relationship

1
9

Authors

Journals

citations
Cited by 84 publications
(55 citation statements)
references
References 18 publications
0
55
0
Order By: Relevance
“…However, these techniques typically do not work directly on the actual perturbed data (like our technique), but attempt to reconstruct the original data distribution using the known noise distribution that has been added on the dataset [1,17]. Privacy preservation can also be achieved through limited dataset view, for example, by horizontal or vertical distribution of the data to different sites [21,9,25]. In our setting, the dataset cannot be dissected in portions, but is being distributed as a whole.…”
Section: Methodology and Difference From Previous Workmentioning
confidence: 99%
“…However, these techniques typically do not work directly on the actual perturbed data (like our technique), but attempt to reconstruct the original data distribution using the known noise distribution that has been added on the dataset [1,17]. Privacy preservation can also be achieved through limited dataset view, for example, by horizontal or vertical distribution of the data to different sites [21,9,25]. In our setting, the dataset cannot be dissected in portions, but is being distributed as a whole.…”
Section: Methodology and Difference From Previous Workmentioning
confidence: 99%
“…A stand-alone approach to privacy-preserving imputation can therefore be used in combination with any existing privacy-preserving data mining algorithm for the same distributed setting. In particular, our results in this paper are suitable for use with any privacy-preserving data mining algorithm for data that is horizontally partitioned between two parties (e.g., [23,20,18]). …”
Section: Related Workmentioning
confidence: 98%
“…Contrary to the above approaches, we do not attempt to reconstruct the original data distribution but work directly on the perturbed data, while guaranteeing preservation of distance properties on them. Privacy-protection via dataset partition is achieved using horizontal or vertical data partitioning [29,12,30,31]. Different portions of the data are distributed to different sites, and data exchange without leakage of private information becomes possible through cryptographic techniques (multiparty computation).…”
Section: Related Workmentioning
confidence: 99%