Many organizations transact in large amounts of data often containing personal identifiable information (PII) and various confidential data. Such organizations are bound by state, federal, and international laws to ensure that the confidentiality of both individuals and sensitive data is not compromised. However, during the privacy preserving process, the utility of such datasets diminishes even while confidentiality is achieved--a problem that has been defined as NP-Hard. In this paper, we investigate a differential privacy machine learning ensemble classifier approach that seeks to preserve data privacy while maintaining an acceptable level of utility. The first step of the methodology applies a strong data privacy granting technique on a dataset using differential privacy. The resulting perturbed data is then passed through a machine learning ensemble classifier, which aims to reduce the classification error, or, equivalently, to increase utility. Then, the association between increasing the number of weak decision tree learners and data utility, which informs us as to whether the ensemble machine learner would classify more correctly is examined. As results, we found that a combined adjustment of the privacy granting noise parameters and an increase in the number of weak learners in the ensemble machine might lead to a lower classification error.
The publication and sharing of network trace data is a critical to the advancement of collaborative research among various entities, both in government, private sector, and academia. However, due to the sensitive and confidential nature of the data involved, entities have to employ various anonymization techniques to meet legal requirements in compliance with confidentiality policies. Nevertheless, the very composition of network trace data makes it a challenge when applying anonymization techniques. On the other hand, basic application of microdata anonymization techniques on network traces is problematic and does not deliver the necessary data usability. Therefore, as a contribution, we point out some of the ongoing challenges in the network trace anonymization. We then suggest usability-aware anonymization heuristics by employing microdata privacy techniques while giving consideration to usability of the anonymized data. Our preliminary results show that with trade-offs, it might be possible to generate anonymized network traces with enhanced usability, on a case-by-case basis using micro-data anonymization techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.