k-Anonymity by microaggregation is one of the most commonly used anonymization techniques. This success is owe to the achievement of a worth of interest tradeoff between information loss and identity disclosure risk. However, this method may have some drawbacks. On the disclosure limitation side, there is a lack of protection against attribute disclosure. On the data utility side, dealing with a real datasets is a challenging task to achieve. Indeed, the latter are characterized by their large number of attributes and the presence of noisy data, such that outliers or, even, data with missing values. Generating an anonymous individual data useful for data mining tasks, while decreasing the influence of noisy data is a compelling task to achieve. In this paper, we introduce a new microaggregation method, called HM-PFSOM, based on fuzzy possibilistic clustering. Our proposed method operates through an hybrid manner. This means that the anonymization process is applied per block of similar data. Thus, we can help to decrease the information loss during the anonymization process. The HM-PFSOM approach proposes to study the distribution of confidential attributes within each sub-dataset. Then, according to the latter distribution, the privacy parameter k is determined, in such a way to preserve the diversity of confidential attributes within the anonymized microdata. This allows to decrease the disclosure risk of confidential information.
One of the most difficult problems in cluster analysis is the identification of the number of groups in a dataset especially in the presence of missing value. Since traditional clustering methods assumed the real number of clusters to be known. However, in real world applications the number of clusters is generally not known a priori. Also, most of clustering methods were developed to analyse complete datasets, they cannot be applied to many practical problems, e.g., on incomplete data. This paper focuses, first, on an algorithm of a fuzzy clustering approach, called OCS-FSOM. The proposed algorithm is based on neural network and uses Optimal Completion Strategy for missing value estimation in incomplete dataset. Then, we propose an extension of our algorithm, to tackle the problem of estimating the number of clusters, by using a multi level OCS-FSOM method. The new algorithm called Multi-OCSFSOM is able to find the optimal number of clusters by using a statistical criterion, that aims at measuring the quality of obtained partitions. Carried out experiments on real-life datasets highlights a very encouraging results in terms of exact determination of optimal number of clusters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.