Due to the growing success of machine learning in the healthcare domain, medical institutions are striving to share their patients' data in the intention to build more accurate models which will be used to make better decisions. However, due to the privacy of the data, they are reluctant. To build the best models, they have to make the best feature selection for horizontally distributed private biomedical data. The previous proposed solutions are based on data perturbation techniques with the loss of performance. In this article, the researchers propose an original solution without perturbation. This is so the data utility is preserved and therefore the performance. The proposed solution uses a genetic algorithm, a distributed Naïve Bayes classifier, and a trusted third-party. The results obtained by the proposed approach surpass those obtained by other researchers, for the same problem.
Machine learning is a powerful tool to mine useful knowledge from vast databases. Many establishments in the medical area such as hospitals, laboratories want to join their efforts with the ambition to extract models that are more accurate. However, this approach faces problems. Due to the laws protecting patient privacy or other similar concerns, parties are reluctant to share their data. In vast amounts of data, which are useful and pertinent in constructing accurate data mining models? In this article, the researchers deal with these challenges for vertically distributed medical data. They propose an original secure wrapper solution to perform feature selection based on genetic algorithms and distributed Naïve Bayes. Contrary to the previous solutions, the original data is not perturbed. Therefore, the data utility and performance are preserved. They prove that the proposed solution selects relevant attributes to increase performance, preserving patient privacy.
Classifying data is to automatically assign predefined classes to data. It is one of the main applications of data mining. Having complete access to all data is critical for building accurate models. Data can be highly sensitive, such as biomedical data, which cannot be disclosed or shared with third party, because it can harm individuals and organizations. The challenge is how to preserve privacy and usefulness of data. Privacy preserving classification addresses this problem. Collaborative models are constructed over networks without violating the data owners' privacy. In this article, the authors address two problems: privacy records deduplication of the same records and privacy-preserving classification. They propose a randomized hash technic for deduplication and an enhanced privacy preserving classification of biomedical data over horizontally distributed data based on two homomorphic encryptions. No private, intermediate or final results are disclosed. Experimentations show that their solution is efficient and secure without loss of accuracy.
Machine learning is a powerful tool to mine useful knowledge from vast databases. Many establishments in the medical area such as hospitals, laboratories want to join their efforts with the ambition to extract models that are more accurate. However, this approach faces problems. Due to the laws protecting patient privacy or other similar concerns, parties are reluctant to share their data. In vast amounts of data, which are useful and pertinent in constructing accurate data mining models? In this article, the researchers deal with these challenges for vertically distributed medical data. They propose an original secure wrapper solution to perform feature selection based on genetic algorithms and distributed Naïve Bayes. Contrary to the previous solutions, the original data is not perturbed. Therefore, the data utility and performance are preserved. They prove that the proposed solution selects relevant attributes to increase performance, preserving patient privacy.
Due to the growing success of machine learning in the healthcare domain, medical institutions are striving to share their patients' data in the intention to build more accurate models which will be used to make better decisions. However, due to the privacy of the data, they are reluctant. To build the best models, they have to make the best feature selection for horizontally distributed private biomedical data. The previous proposed solutions are based on data perturbation techniques with the loss of performance. In this article, the researchers propose an original solution without perturbation. This is so the data utility is preserved and therefore the performance. The proposed solution uses a genetic algorithm, a distributed Naïve Bayes classifier, and a trusted third-party. The results obtained by the proposed approach surpass those obtained by other researchers, for the same problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.