Data publishing techniques have led to breakthroughs in several areas. These tools provide a promising direction. However, when they are applied to private or sensitive data such as patient medical records, the published data may divulge critical patient information. In order to address this issue, we propose a differential private data publishing method (SSKM_DP) based on the SFLA-Kohonen network, which perturbs sensitive attributes based on the maximum information coefficient to achieve a trade-off between security and usability. Additionally, we introduced a single-population frog jump algorithm (SFLA) to optimize the network. Extensive experiments on benchmark datasets have demonstrated that SSKM_DP outperforms state-of-the-art methods for differentially private data publishing techniques significantly.
As a social information product, the privacy and usability of high-dimensional data are the core issues in the field of privacy protection. Feature selection is a commonly used dimensionality reduction processing technique for high-dimensional data. Some feature selection methods only process some of the features selected by the algorithm and do not take into account the information associated with the selected features, resulting in the usability of the final experimental results not being high. This paper proposes a hybrid method based on feature selection and a cluster analysis to solve the data utility and privacy problems of high-dimensional data in the actual publishing process. The proposed method is divided into three stages: (1) screening features; (2) analyzing the clustering of features; and (3) adaptive noise. This paper uses the Wisconsin Breast Cancer Diagnostic (WDBC) database from UCI’s Machine Learning Library. Using classification accuracy to evaluate the performance of the proposed method, the experiments show that the original data are processed by the algorithm in this paper while protecting the sensitive data information while retaining the contribution of the data to the diagnostic results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.