Concerns on location privacy frequently arise with the rapid development of GPS enabled devices and location-based applications. While spatial transformation techniques such as location perturbation or generalization have been studied extensively, most techniques rely on syntactic privacy models without rigorous privacy guarantee. Many of them only consider static scenarios or perturb the location at single timestamps without considering temporal correlations of a moving user's locations, and hence are vulnerable to various inference attacks. While differential privacy has been accepted as a standard for privacy protection, applying differential privacy in location based applications presents new challenges, as the protection needs to be enforced on the fly for a single user and needs to incorporate temporal correlations between a user's locations.In this paper, we propose a systematic solution to preserve location privacy with rigorous privacy guarantee. First, we propose a new definition, "δ-location set" based differential privacy, to account for the temporal correlations in location data. Second, we show that the well known 1-norm sensitivity fails to capture the geometric sensitivity in multidimensional space and propose a new notion, sensitivity hull, based on which the error of differential privacy is bounded. Third, to obtain the optimal utility we present a planar isotropic mechanism (PIM) for location perturbation, which is the first mechanism achieving the lower bound of differential privacy. Experiments on real-world datasets also demonstrate that PIM significantly outperforms baseline approaches in data utility.
Abstract. Differential privacy is a strong notion for protecting individual privacy in privacy preserving data analysis or publishing. In this paper, we study the problem of differentially private histogram release based on an interactive differential privacy interface. We propose two multidimensional partitioning strategies including a baseline cell-based partitioning and an innovative kd-tree based partitioning. In addition to providing formal proofs for differential privacy and usefulness guarantees for linear distributive queries , we also present a set of experimental results and demonstrate the feasibility and performance of our method.
Head and neck squamous cell carcinoma (HNSCC) is one of the most common types of human cancer and frequently metastasizes to LNs. Identifying metastasis-promoting factors is of immense clinical interest, as the prognosis for patients with even a single unilateral LN metastasis is extremely poor. Here, we report that p90 ribosomal S6 kinase 2 (RSK2) promotes human HNSCC cell invasion and metastasis. We determined that RSK2 was overexpressed and activated in highly
Set-valued data provides enormous opportunities for various data mining tasks. In this paper, we study the problem of publishing set-valued data for data mining tasks under the rigorous differential privacy model. All existing data publishing methods for set-valued data are based on partition-based privacy models, for example
k
-anonymity, which are vulnerable to privacy attacks based on background knowledge. In contrast, differential privacy provides strong privacy guarantees independent of an adversary's background knowledge and computational power. Existing data publishing approaches for differential privacy, however, are not adequate in terms of both utility and scalability in the context of set-valued data due to its high dimensionality.
We demonstrate that set-valued data could be efficiently released under differential privacy with guaranteed utility with the help of context-free taxonomy trees. We propose a probabilistic top-down partitioning algorithm to generate a differentially private release, which scales linearly with the input data size. We also discuss the applicability of our idea to the context of relational data. We prove that our result is (∈, δ)-useful for the class of counting queries, the foundation of many data mining tasks. We show that our approach maintains high utility for counting queries and frequent itemset mining and scales to large datasets through extensive experiments on real-life set-valued datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.