The classification and prediction accuracy of Machine Learning (ML) algorithms, which often outperform human experts of the related field, have enabled them to be used in areas such as health and disease prediction, image and speech recognition, cyber-security threats and credit-card fraud detection and others. However, laws, ethics and privacy concerns prevent ML algorithms to be used in many real-case scenarios. In order to overcome this problem, we introduce a few flexible and secure building blocks which can be used to build different privacy preserving classifications schemes based on already trained ML models. Then, as a use-case scenario, we utilize and practically use those blocks to enable a privacy preserving Naïve Bayes classifier in the semi-honest model with application to breast cancer detection. Our theoretical analysis and experimental results show that the proposed scheme in many aspects is more efficient in terms of computation and communication cost, as well as in terms of security properties than several state of the art schemes. Furthermore, our privacy preserving scheme shows no loss of accuracy compared to the plain classifier.
In the quest of fulfilling the rapidly increasing needs for processing power, while avoiding the expensive supercomputers, scientists look for cheaper parallel processing alternatives and technologies such as those found on GRID, clusters and similar ones. With the emergence of cloud, it was a matter of time when this will be done on cloud as well. However, cloud has its unique nature. On one side, its servers are hybrid in terms of their SOA as well as the maximum guaranteed processing power they offer. On the other side the Internet (the medium by which the cloud services are offered by default) is often unpredictable and dynamic. Furthermore, during our parallel processing, simultaneously we want to optimize the time and price, goals often in contradiction to each other. Three algorithms are studied and compared with the optimal solution, including our proposed cloud adopted feedback algorithm, which outperformed the others.
Nowadays many different entities collect data of the same nature, but in slightly different environments. In this sense different hospitals collect data about their patients’ symptoms and corresponding disease diagnoses, different banks collect transactions of their customers’ bank accounts, multiple cyber-security companies collect data about log files and corresponding attacks, etc. It is shown that if those different entities would merge their privately collected data in a single dataset and use it to train a machine learning (ML) model, they often end up with a trained model that outperforms the human experts of the corresponding fields in terms of accurate predictions. However, there is a drawback. Due to privacy concerns, empowered by laws and ethical reasons, no entity is willing to share with others their privately collected data. The same problem appears during the classification case over an already trained ML model. On one hand, a user that has an unclassified query (record), doesn’t want to share with the server that owns the trained model neither the content of the query (which might contain private data such as credit card number, IP address, etc.), nor the final prediction (classification) of the query. On the other hand, the owner of the trained model doesn’t want to leak any parameter of the trained model to the user. In order to overcome those shortcomings, several cryptographic and probabilistic techniques have been proposed during the last few years to enable both privacy preserving training and privacy preserving classification schemes. Some of them include anonymization and k-anonymity, differential privacy, secure multiparty computation (MPC), federated learning, Private Information Retrieval (PIR), Oblivious Transfer (OT), garbled circuits and/or homomorphic encryption, to name a few. Theoretical analyses and experimental results show that the current privacy preserving schemes are suitable for real-case deployment, while the accuracy of most of them differ little or not at all with the schemes that work in non-privacy preserving fashion.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.