Towards Practical and Robust Labeled Pattern Matching in Trillion-Edge Graphs

Reza, Tahsin; Klymko, Christine; Ripeanu, Matei; Sanders, Geoffrey; Pearce, Roger

doi:10.1109/cluster.2017.85

Cited by 13 publications

(12 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…4.1.4 Asynchronous Vertex-centric: Reza et al [65] proposed recently a distributed algorithm for evaluating subgraph isomorphism on one trillion-edge graphs based on HavoqGT, a platform for asynchronous vertex-centric graph processing that was developed in 2013 [59]. The algorithm is composed of two phases.…”

Section: Synchronous Vertex-centricmentioning

confidence: 99%

“…Master-slave 778M edges [81] 2012 Subgraph isomorphism Async. Master-slave 1B nodes [27] 2014 Subgraph isomorphism BSP Vertex-centric 100K nodes [32] 2014 Inexact ISO BSP Veretx-centric 105M nodes [74] 2014 Subgraph isomorphism BSP Vertex-centric 42M nodes [60] 2016 Subgraph isomorphism BSP Master-slave 1.3B nodes [65] 2017 Inexact ISO Async. Vertex-centric 68B nodes [67] 2018 Subgraph isomorphism Async.…”

Section: Work Year Modelmentioning

confidence: 99%

See 1 more Smart Citation

A Survey on Distributed Graph Pattern Matching in Massive Graphs

et al. 2021

View full text Add to dashboard Cite

Besides its NP-completeness, the strict constraints of subgraph isomorphism are making it impractical for graph pattern matching (GPM) in the context of big data. As a result, relaxed GPM models have emerged as they yield interesting results in a polynomial time. However, massive graphs generated by mostly social networks require a distributed storing and processing of the data over multiple machines, thus, requiring GPM to be revised by adopting new paradigms of big graphs processing, e.g., Think-Like-A-Vertex and its derivatives. This article discusses and proposes a classification of distributed GPM approaches with a narrow focus on the relaxed models.

show abstract

Section: Synchronous Vertex-centricmentioning

confidence: 99%

Section: Work Year Modelmentioning

confidence: 99%

A Survey on Distributed Graph Pattern Matching in Massive Graphs

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Later on [29] improves the performance of subgraph matching up to three orders of magnitude by postponing the Cartesian products based on the structure of a query to minimize the redundant Cartesian products. [30], [31] provides a pruning method on labeled networks and graphlets to reduce the vertex number by orders of magnitude prior to the actual counting.…”

Section: Related Workmentioning

confidence: 99%

SubGraph2Vec: Highly-Vectorized Tree-like Subgraph Counting

Chen

Sahinalp

et al. 2019

2019 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

Subgraph counting aims to count occurrences of a template T in a given network G(V, E). It is a powerful graph analysis tool and has found real-world applications in diverse domains. Scaling subgraph counting problems is known to be memory bounded and computationally challenging with exponential complexity. Although scalable parallel algorithms are known for several graph problems such as Triangle Counting and PageRank, this is not common for counting complex subgraphs. Here we address this challenge and study connected acyclic graphs or trees. We propose a novel vectorized subgraph counting algorithm, named SUBGRAPH2VEC, as well as both shared memory and distributed implementations: 1) reducing algorithmic complexity by minimizing neighbor traversal; 2) achieving a highly-vectorized implementation upon linear algebra kernels to significantly improve performance and hardware utilization. 3) SUBGRAPH2VEC improves the overall performance over the state-of-the-art work by orders of magnitude and up to 660x on a single node. 4) SUBGRAPH2VEC in distributed mode can scale up the template size to 20 and maintain good strong scalability. 5) enabling portability to both CPU and GPU.

show abstract

“…Contributions. This paper serves two goals: first, it is a synthesis of an ongoing long-term project [Reza et al 2017; and, second, it presents new system features, usage scenarios, empirical experiments, and comparisons with related projects, that strengthen the confidence that pattern matching based on iterative pruning via constraint checking is an effective and scalable approach. The list of contributions presented below is organized with this dual goal in mind: on the one side, it aims to offer an overall project roadmap, and, on the other side, it highlights the new experiments and the insights they bring forth.…”

Section: Introductionmentioning

confidence: 99%

“…We show ] that these constraints eliminate all and only non-matching vertices and edges (thus offering full precision and recall) for arbitrary templates. We identify various subclasses of search templates (e.g., acyclic and edge-monocyclic with no duplicate labels) that can be extremely effectively supported [Reza et al 2017].…”

Section: Introductionmentioning

confidence: 99%

Scalable Pattern Matching in Metadata Graphs via Constraint Checking

Reza¹,

Halawa²,

Ripeanu³

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Pattern matching is a fundamental tool for answering complex graph queries. Unfortunately, existing solutions have limited capabilities: they do not scale to process large graphs and/or support only a restricted set of search templates or usage scenarios. Moreover, the algorithms at the core of the existing techniques are not suitable for today's graph processing infrastructures relying on horizontal scalability and shared-nothing clusters as most of these algorithms are inherently sequential and difficult to parallelize.We present an algorithmic pipeline that bases pattern matching on constraint checking. The key intuition is that each vertex or edge participating in a match has to meet a set of constrains implicitly specified by the search template. These constraints can be verified independently and, typically, are less expensive to compute than searching the full template. The pipeline we propose iterates over these constraints to eliminate all the vertices and edges that do not participate in any match and reduces the background graph to a subgraph which is the union of all matches -the complete set of all vertices and edges that participate in at least one match. Additional analysis can be performed on this annotated, reduced graph, such as full match enumeration, match counting, or vertex/edge centrality. Furthermore, a vertex-centric formulation for constraint checking algorithms exists, and this makes it possible to harness existing high-performance, vertex-centric graph processing frameworks.The key contribution of this work is a design following the constraint checking approach for exact matching and its experimental evaluation. We show that the proposed technique: (i) enables highly scalable pattern matching on labeled graphs, (ii) supports arbitrary patterns with 100% precision, (iii) always selects all vertices and edges that participate in matches, thus offering 100% recall, and (iv) supports a set of popular data analytics scenarios. We implement our approach on top of HavoqGT, an open-source asynchronous graph processing framework, and demonstrate its advantages through strong and weak scaling experiments on massive-scale real-world (up to 257 billion edges) and synthetic (up to 4.4 trillion edges) labeled graphs respectively, and at scales (1,024 nodes / 36,864 cores) orders of magnitude larger than used in the past for similar problems. Extensive comparisons with three state-of-the-art systems confirm the advantages of our approach.

show abstract

Towards Practical and Robust Labeled Pattern Matching in Trillion-Edge Graphs

Cited by 13 publications

References 28 publications

A Survey on Distributed Graph Pattern Matching in Massive Graphs

A Survey on Distributed Graph Pattern Matching in Massive Graphs

SubGraph2Vec: Highly-Vectorized Tree-like Subgraph Counting

Scalable Pattern Matching in Metadata Graphs via Constraint Checking

Contact Info

Product

Resources

About