Efficient Privacy Preserving Distributed K-Means for Non-IID Data

Brandão, André; Mendes, Ricardo; Vilela, João P.

doi:10.1007/978-3-030-74251-5_35

Cited by 4 publications

(3 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ii) Client Initialization First: Since the initial centroid selection starts closer to the data source, this approach has shown promise in enhancing overall performance. Brandao et al [13] suggested an approach where each client first locally determines the optimal ∼ K within the range [1, K] using the Silhouette metric [14]. Subsequently, they share the best ∼ K initial centroids with the server.…”

Section: Related Workmentioning

confidence: 99%

“…Dennis et al [17] introduced a one-shot federated clustering scheme based on K-means (known as K-FED). However, a practical drawback shared with Brandao's [13] method is the substantial computational burden placed on edge devices, which typically have constrained computing capabilities. Similar to Dennis et al's [17] approach, we start the initialization for federated K-means at the edge clients, where the data resides.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Greedy centroid initialization for federated K-means

Yang,

Mohammadi Amiri,

Kulkarni

2024

Knowl Inf Syst

View full text Add to dashboard Cite

In this paper, our focus is on K-means within a federated setting, where clients retain their raw data on local devices, and the raw data never leaves the corresponding devices. Given the importance of initialization on the federated K-means algorithm, our objective is to find better initial centroids by utilizing the local data stored on each client. To this end, we start the centroid initialization at the clients, rather than at the server, since the server initially lacks any preliminary insight into the clients' data. The clients first select their local initial clusters, and subsequently share their clustering information (including cluster centroids and sizes) with the server. The server then employs a greedy algorithm to determine the global initial centroids based on the information received from the clients. Numerical results obtained from both synthetic and public datasets demonstrate that our proposed method can achieve better and more stable performance than three distinct federated K-means variants, and comparable performance to the centralized K-means algorithm.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Greedy centroid initialization for federated K-means

Yang,

Mohammadi Amiri,

Kulkarni

2024

Knowl Inf Syst

View full text Add to dashboard Cite

show abstract

“…Negative database-based methods have two problemsconversion to negative database is not possible for all kinds of data and there is huge overhead on data owner side for negative database construction. Brando et al (2021) [18] proposed a distributed privacy preserving K-mean algorithm. Client compute K-mean for their data locally and send the centroids to a server.…”

Section: Related Workmentioning

confidence: 99%

Hybrid Cloud-Based Privacy Preserving Clustering as Service for Enterprise Big Data

Kulkarni

Manjunath²

2023

IJRITCC

View full text Add to dashboard Cite

Clustering as service is being offered by many cloud service providers. It helps enterprises to learn hidden patterns and learn knowledge from large, big data generated by enterprises. Though it brings lot of value to enterprises, it also exposes the data to various security and privacy threats. Privacy preserving clustering is being proposed a solution to address this problem. But the privacy preserving clustering as outsourced service model involves too much overhead on querying user, lacks adaptivity to incremental data and involves frequent interaction between service provider and the querying user. There is also a lack of personalization to clustering by the querying user. This work “Locality Sensitive Hashing for Transformed Dataset (LSHTD)” proposes a hybrid cloud-based clustering as service model for streaming data that address the problems in the existing model such as privacy preserving k-means clustering outsourcing under multiple keys (PPCOM) and secure nearest neighbor clustering (SNNC) models, The solution combines hybrid cloud, LSHTD clustering algorithm as outsourced service model. Through experiments, the proposed solution is able is found to reduce the computation cost by 23% and communication cost by 6% and able to provide better clustering accuracy with ARI greater than 4.59% compared to existing works.

show abstract