Graph matching is a generalization of the classic graph isomorphism problem. By using only their structures a graph-matching algorithm finds a map between the vertex sets of two similar graphs. This has applications in the deanonymization of social and information networks and, more generally, in the merging of structural data from different domains.One class of graph-matching algorithms starts with a known seed set of matched node pairs. Despite the success of these algorithms in practical applications, their performance has been observed to be very sensitive to the size of the seed set. The lack of a rigorous understanding of parameters and performance makes it difficult to design systems and predict their behavior.In this paper, we propose and analyze a very simple percolation -based graph matching algorithm that incrementally maps every pair of nodes (i, j) with at least r neighboring mapped pairs. The simplicity of this algorithm makes possible a rigorous analysis that relies on recent advances in bootstrap percolation theory for the G(n, p) random graph. We prove conditions on the model parameters in which percolation graph matching succeeds, and we establish a phase transition in the size of the seed set. We also confirm through experiments that the performance of percolation graph matching is surprisingly good, both for synthetic graphs and real social-network data.
Abstract-Network alignment refers to the problem of matching the vertex sets of two unlabeled graphs, which can be viewed as a generalization of the classic graph isomorphism problem. Network alignment has applications in several fields, including social network analysis, privacy, pattern recognition, computer vision, and computational biology. A number of heuristic algorithms have been proposed in these fields. Recent progress in the analysis of network alignment over stochastic models sheds light on the interplay between network parameters and matchability.In this paper, we consider the alignment problem when the two networks overlap only partially, i.e., there exist vertices in one network that have no counterpart in the other. We define a random bigraph model that generates two correlated graphs G1,2; it is parameterized by the expected node overlap t 2 and by the expected edge overlap s 2 . We define a cost function for structural mismatch under a particular alignment, and we identify a threshold for perfect matchability: if the average node degrees of G1,2 grow as ω (s −2 t −1 log(n) , then minimization of the proposed cost function results in an alignment which (i) is over exactly the set of shared nodes between G1 and G2, and (ii) agrees with the true matching between these shared nodes. Our result shows that network alignment is fundamentally robust to partial edge and node overlaps.
Abstract-Many graph mining and network analysis problems rely on the availability of the full network over a set of nodes. But inferring a full network is sometimes non-trivial if the raw data is in the form of many small patches or subgraphs, of the true network, and if there are ambiguities in the identities of nodes or edges in these patches. This may happen because of noise or because of the nature of data; for instance, in social networks, names are typically not unique. Graph assembly refers to the problem of reconstructing a graph from these many, possibly noisy, partial observations. Prior work suggests that graph assembly is essentially impossible in regimes of interest when the true graph is Erdős-Rényi . The purpose of the present paper is to show that a modest amount of clustering is sufficient to assemble even very large graphs.We introduce the G(n, p; q) random graph model, which is the random closure over all open triangles of a G(n, p) Erdős-Rényi , and show that this model exhibits higher clustering than an equivalent Erdős-Rényi . We focus on an extreme case of graph assembly: the patches are small (1-hop egonets) and are unlabeled. We show that in realistic regimes, graph assembly is fundamentally feasible, because we can identify, for every edge e, a subgraph induced by its neighbors that is unique and present in every patch containing e. Using this result, we build a practical algorithm that uses canonical labeling to reconstruct the original graph from noiseless patches. We also provide an achievability result for noisy patches, which are obtained by edge-sampling the original egonets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.