Aiming at preventing the privacy disclosure of sensitive information, issues related to privacy protection in cloud computing have attracted the interest of researchers. To protect the privacy of users during clustering in a cloud computing environment, we present a privacy-preserving density peak clustering (PPDPC) algorithm that neither discloses personal privacy information nor leaks the cluster centers. Our scheme contains two steps of density peak clustering: First, a cloud service provider calculates the cluster centers without knowing each participant's private data and without disclosing any cluster center information to the other participants, and second, participant allocation is secure and every participant is prevented from identifying the other members of the same cluster. Security analysis and comparison experiments show that the proposed PPDPC algorithm not only obtains good accuracy with respect to density peak clustering but also resists collusion attacks even if the cloud service provider is collaborating with all except one participant. Both theoretical analysis and experimental results confirm the security and accuracy of our method. KEYWORDS cloud computing, data mining, density peak clustering, homomorphic encryption, privacy preservation
INTRODUCTIONWith the rapid development of mobile social networks and computer technology, all types of mobile terminals and servers have started generating huge amounts of data at all times, presenting a serious challenge to the computing ability of enterprises. 1,2 Cloud computing technology, which is used to address this challenge, is growing. More and more enterprises are storing data in cloud servers to save economic costs; its powerful computing power is convenient for handling huge amounts of data. [3][4][5] Additionally, data mining technology can help users analyze and extract key value information from a large amount of data in scientific research and business applications. The analysis of these data allows the prediction of future development trends and directions. 6 Clustering, as one of the important research methods of data mining, aims to divide data objects into several clusters such that object similarity in a cluster is high, while the similarity between each cluster is low.In the process of clustering analysis, a large amount of user-based privacy data, such as geographical location, electricity consumption data, and spatiotemporal sensing data, is collected and analyzed. [7][8][9] The security and privacy of this data depends on the security of cloud services.The sensitive data is directly outsourced to the cloud server for calculation, at which point the user's privacy may be leaked if the cloud service provider is malicious or dishonest. If multiple users collude with each other, they combine their own information to calculate their respective distances and then calculate the cluster centers by distance. If user's privacy and cluster centers are disclosed, serious consequences can result.Therefore, the development of a data mining techn...