The growing demand for everyday data insights drives the pursuit of more sophisticated infrastructures and artificial intelligence algorithms. When combined with the growing number of interconnected devices, this originates concerns about scalability and privacy. The main problem is that devices can detect the environment and generate large volumes of possibly identifiable data. Public cloud-based technologies have been proposed as a solution, due to their high availability and low entry costs. However, there are growing concerns regarding data privacy, especially with the introduction of the new General Data Protection Regulation, due to the inherent lack of control caused by using off-premise computational resources on which public cloud belongs. Users have no control over the data uploaded to such services as the cloud, which increases the uncontrolled distribution of information to third parties. This work aims to provide a modular approach that uses cloud-of-clouds to store persistent data and reduce upfront costs while allowing information to remain private and under users’ control. In addition to storage, this work also extends focus on usability modules that enable data sharing. Any user can securely share and analyze/compute the uploaded data using private computing without revealing private data. This private computation can be training machine learning (ML) models. To achieve this, we use a combination of state-of-the-art technologies, such as MultiParty Computation (MPC) and K-anonymization to produce a complete system with intrinsic privacy properties.
Privacy is becoming a crucial requirement in many machine learning systems. In this paper we introduce an efficient and secure distributed K-Means algorithm, that is robust to non-IID data. The base idea of our proposal consists in each client computing the K-Means algorithm locally, with a variable number of clusters. The server will use the resultant centroids to apply the K-Means algorithm again, discovering the global centroids. To maintain the client's privacy, homomorphic encryption and secure aggregation is used in the process of learning the global centroids. This algorithm is efficient and reduces transmission costs, since only the local centroids are used to find the global centroids. In our experimental evaluation, we demonstrate that our strategy achieves a similar performance to the centralized version even in cases where the data follows an extreme non-IID form.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.