The proliferation of online social networks, and the concomitant accumulation of user data, give rise to hotly debated issues of privacy, security, and control. One specific challenge is the sharing or public release of anonymized data without accidentally leaking personally identifiable information (PII). Unfortunately, it is often difficult to ascertain that sophisticated statistical techniques, potentially employing additional external data sources, are unable to break anonymity.In this paper, we consider an instance of this problem, where the object of interest is the structure of a social network, i.e., a graph describing users and their links. Recent work demonstrates that anonymizing node identities may not be sufficient to keep the network private: the availability of node and link data from another domain, which is correlated with the anonymized network, has been used to re-identify the anonymized nodes. This paper is about conditions under which such a de-anonymization process is possible.We attempt to shed light on the following question: can we assume that a sufficiently sparse network is inherently anonymous, in the sense that even with unlimited computational power, deanonymization is impossible? Our approach is to introduce a random graph model for a version of the de-anonymization problem, which is parameterized by the expected node degree and a similarity parameter that controls the correlation between two graphs over the same vertex set. We find simple conditions on these parameters delineating the boundary of privacy, and show that the mean node degree need only grow slightly faster than log n with network size n for nodes to be identifiable. Our results have policy implications for sharing of anonymized network information.
Abstract-Approximate graph matching (AGM) refers to the problem of mapping the vertices of two structurally similar graphs, which has applications in social networks, computer vision, chemistry, and biology. Given its computational cost, AGM has mostly been limited to either small graphs (e.g., tens or hundreds of nodes), or to large graphs in combination with side information beyond the graph structure (e.g., a seed set of pre-mapped node pairs). In this paper, we cast AGM in a Bayesian framework based on a clean definition of the probability of correctly mapping two nodes, which leads to a polynomial time algorithm that does not require side information. Node features such as degree and distances to other nodes are used as fingerprints. The algorithm proceeds in rounds, such that the most likely pairs are mapped first; these pairs subsequently generate additional features in the fingerprints of other nodes. We evaluate our method over real social networks and show that it achieves a very low matching error provided the two graphs are sufficiently similar. We also evaluate our method on random graph models to characterize its behavior under various levels of node clustering.
In recommender systems, usually, a central server needs to have access to users' profiles in order to generate useful recommendations. Having this access, however, undermines the users' privacy. The more information is revealed to the server on the user-item relations, the lower the users' privacy is. Yet, hiding part of the profiles to increase the privacy comes at the cost of recommendation accuracy or difficulty of implementing the method. In this paper, we propose a distributed mechanism for users to augment their profiles in a way that obfuscates the user-item connection to an untrusted server, with minimum loss on the accuracy of the recommender system. We rely on the central server to generate the recommendations. However, each user stores his profile offline, modifies it by partly merging it with the profile of similar users through direct contact with them, and only then periodically uploads his profile to the server. We propose a metric to measure privacy at the system level, using graph matching concepts. Applying our method to the Netflix prize dataset, we show the effectiveness of the algorithm in solving the tradeoff between privacy and accuracy in recommender systems in an applicable way.
During the past decade, a number of different studies have identified several peculiar properties of networks that arise from a diverse universe, ranging from social to computer networks. A recently observed feature is known as network densification, which occurs when the number of edges grows much faster than the number of nodes, as the network evolves over time. This surprising phenomenon has been empirically validated in a variety of networks that emerge in the real world and mathematical models have been recently proposed to explain it. Leveraging on how real data is usually gathered and used, we propose a new model called Edge Sampling to explain how densification can arise. Our model is innovative, as we consider a fixed underlying graph and a process that discovers this graph by probabilistically sampling its edges. We show that this model possesses several interesting features, in particular, that edges and nodes discovered can exhibit densification. Moreover, when the node degree of the fixed underlying graph follows a heavy-tailed distribution, we show that the Edge Sampling model can yield power law densification, establishing an approximate relationship between the degree exponent and the densification exponent. The theoretical findings are supported by numerical evaluations of the model. Finally, we apply our model to real network data to evaluate its performance on capturing the previously observed densification. Our results indicate that edge sampling is indeed a plausible alternative explanation for the densification phenomenon that has been recently observed.
During the past decade, a number of different studies have identified several peculiar properties of networks that arise from a diverse universe, ranging from social to computer networks. A recently observed feature is known as network densification, which occurs when the number of edges grows much faster than the number of nodes, as the network evolves over time. This surprising phenomenon has been empirically validated in a variety of networks that emerge in the real world and mathematical models have been recently proposed to explain it. Leveraging on how real data is usually gathered and used, we propose a new model called Edge Sampling to explain how densification can arise. Our model is innovative, as we consider a fixed underlying graph and a process that discovers this graph by probabilistically sampling its edges. We show that this model possesses several interesting features, in particular, that edges and nodes discovered can exhibit densification. Moreover, when the node degree of the fixed underlying graph follows a heavy-tailed distribution, we show that the Edge Sampling model can yield power law densification, establishing an approximate relationship between the degree exponent and the densification exponent. The theoretical findings are supported by numerical evaluations of the model. Finally, we apply our model to real network data to evaluate its performance on capturing the previously observed densification. Our results indicate that edge sampling is indeed a plausible alternative explanation for the densification phenomenon that has been recently observed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.