Differential privacy has recently emerged as the de facto standard for private data release. This makes it possible to provide strong theoretical guarantees on the privacy and utility of released data. While it is well-understood how to release data based on counts and simple functions under this guarantee, it remains to provide general purpose techniques that are useful for a wider variety of queries. In this paper, we focus on spatial data, i.e., any multidimensional data that can be indexed by a tree structure. Directly applying existing differential privacy methods to this type of data simply generates noise.We propose instead the class of "private spatial decompositions": these adapt standard spatial indexing methods such as quadtrees and kd-trees to provide a private description of the data distribution. Equipping such structures with differential privacy requires several steps to ensure that they provide meaningful privacy guarantees. Various basic steps, such as choosing splitting points and describing the distribution of points within a region, must be done privately, and the guarantees of the different building blocks must be composed into an overall guarantee. Consequently, we expose the design space for private spatial decompositions, and analyze some key examples. A major contribution of our work is to provide new techniques for parameter setting and post-processing of the output to improve the accuracy of query answers. Our experimental study demonstrates that it is possible to build such decompositions efficiently, and use them to answer a variety of queries privately and with high accuracy.
Recent years have witnessed an unprecedented proliferation of social media. People around the globe author, every day, millions of blog posts, micro-blog posts, social network status updates, etc. This rich stream of information can be used to identify, on an ongoing basis, emerging stories, and events that capture popular attention. Stories can be identified via groups of tightly-coupled realworld entities, namely the people, locations, products, etc., that are involved in the story. The sheer scale, and rapid evolution of the data involved necessitate highly efficient techniques for identifying important stories at every point of time.The main challenge in real-time story identification is the maintenance of dense subgraphs (corresponding to groups of tightlycoupled entities) under streaming edge weight updates (resulting from a stream of user-generated content). This is the first work to study the efficient maintenance of dense subgraphs under such streaming edge weight updates. For a wide range of definitions of density, we derive theoretical results regarding the magnitude of change that a single edge weight update can cause. Based on these, we propose a novel algorithm, DYNDENS, which outperforms adaptations of existing techniques to this setting, and yields meaningful results. Our approach is validated by a thorough experimental evaluation on large-scale real and synthetic datasets.
Privacy is a serious concern when microdata need to be released for ad hoc analyses. The privacy goals of existing privacy protection approaches (e.g., -anonymity and -diversity)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.