Besides its NP-completeness, the strict constraints of subgraph isomorphism are making it impractical for graph pattern matching (GPM) in the context of big data. As a result, relaxed GPM models have emerged as they yield interesting results in a polynomial time. However, massive graphs generated by mostly social networks require a distributed storing and processing of the data over multiple machines, thus, requiring GPM to be revised by adopting new paradigms of big graphs processing, e.g., Think-Like-A-Vertex and its derivatives. This article discusses and proposes a classification of distributed GPM approaches with a narrow focus on the relaxed models.
Prior algorithms on graph simulation for distributed graphs are not scalable enough as they exhibit heavy message passing. Moreover, they are dependent on the graph partitioning quality that can be a bottleneck due to the natural skew present in real-world data. As a result, their degree of parallelism becomes limited. In this paper, we propose an efficient parallel edge-centric approach for distributed graph pattern matching. We design a novel distributed data structure called ST that allows a fine-grain parallelism, and hence guarantees linear scalability. Based on ST, we develop a parallel graph simulation algorithm called PGSim. Furthermore, we propose PDSim, an edge-centric algorithm that efficiently evaluates dual simulation in parallel. PDSim combines ST and PGSim in a Split-and-Combine approach to accelerate the computation stages. We prove the effectiveness and efficiency of these propositions through theoretical guarantees and extensive experiments on massive graphs. The achieved results confirm that our approach outperforms existing algorithms by more than an order of magnitude. KeywordsGraph pattern matching • Subgraph matching • Graph simulation • Dual simulation • Massive graph • Parallel algorithm 1 Introduction Graph Pattern Matching (GPM) has been used in many application domains such as software plagiarism detection, in-database analytics and search engines [29, 34]. It allows finding answers to an input query graph in a relatively larger data graph. GPM can be answered through different models that define the matching constraints based on the query graph. Subgraph isomorphism, which is an NP-Complete problem [17], is the most studied model in this domain. It defines a bijective mapping between the query graph vertices and the data graph vertices, such mapping must preserve the topology of the query graph. Among the most important studies on subgraph isomorphism, we find Ullman's algorithm [44], VF2 [5], QuickSI [41], GraphQL [20], SPath [51], Turbo iso [19], Boost iso [37, 45] and CFL-Match [2]. In the context of large graphs, various works addressed subgraph isomor-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.