Subgraph Counting: Color Coding Beyond Trees

Chakaravarthy, Venkatesan T.; Kapralov, Michael; Murali, Prakash; Petrini, Fabrizio; Que, Xinyu; Sabharwal, Yogish; Schieber, Baruch

doi:10.1109/ipdps.2016.122

Cited by 19 publications

(16 citation statements)

References 31 publications

(81 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A subsequent distributed scalable implementation of CC, SCALA [20], allowed the authors to count on graphs with 1-2M nodes the number of non-induced paths and trees. Another recent effort to scale CC is [7]: using a distributed algorithm, the authors estimate the occurrences of 10 different subgraphs of treewidth 2 and size up to k = 10 nodes, in graphs of up to 2M nodes. While these encouraging results make clear that CC is a promising approach, they leave wide open the important question of estimating the distribution of induced subgraphs, aka graphlets.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Motif Counting Beyond Five Nodes

Bressan

Chierichetti

Kumar

et al. 2018

ACM Trans. Knowl. Discov. Data

View full text Add to dashboard Cite

Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC's memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets. CCS Concepts: • Mathematics of computing → Graph enumeration; Graph algorithms; • Theory of computation → Random walks and Markov chains; • Information systems → Data mining; Web mining;

show abstract

Section: Related Workmentioning

confidence: 99%

“…Counting graphlets is a well-studied problem in graph mining and social-networks analysis [1,3,7,8,11,14,18,20,[27][28][29]32]. Given an input graph, the problem asks to count the frequencies of all induced connected subgraphs (called graphlets), up to isomorphism, of a certain size.…”

Section: Introductionmentioning

confidence: 99%

Motif Counting Beyond Five Nodes

Bressan

Chierichetti

Kumar

et al. 2018

ACM Trans. Knowl. Discov. Data

View full text Add to dashboard Cite

show abstract

“…Given a k-node template T , it assigns random colors between 0 and k−1 to each vertex of a network graph G, and it counts the number of the occurrences of colorful embedding, which is isomorphic to T while having distinct colors on each vertex. Both theoretical proof [9], [14], [6] and experiments [3], [15] show that, with proper normalization, the count of colorful embeddings is an unbiased estimator of the actual count of embeddings. Alon et al [9] proved a guarantee of bounding the count by (1± )emb(T, G) with a probability of 1 − 2δ after running at most N iterations of the algorithm.…”

Section: B Statement Of Problemmentioning

confidence: 92%

“…[5] • Computing kernel of other algorithms: Sub-tree counting is one of the computing kernels of bounded treewidth subgraph (such as circles, cactus graphs, series-parallel graphs etc.) counting problem [6] and also the kernel of network clustering [7]. Despite subgraph counting plays an important role in discovery of patterns in a graph network, counting the exact number of subgraphs of size k in a n-vertex network takes O(n k ) time [4], which is computationally challenging even for moderate values of n and k. In fact, determining whether a graph G contains a subgraph to H is a related graph isomorphic problem that is NP-complete [8].…”

Section: Introductionmentioning

confidence: 99%

SubGraph2Vec: Highly-Vectorized Tree-like Subgraph Counting

Chen

Sahinalp

et al. 2019

2019 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

Subgraph counting aims to count occurrences of a template T in a given network G(V, E). It is a powerful graph analysis tool and has found real-world applications in diverse domains. Scaling subgraph counting problems is known to be memory bounded and computationally challenging with exponential complexity. Although scalable parallel algorithms are known for several graph problems such as Triangle Counting and PageRank, this is not common for counting complex subgraphs. Here we address this challenge and study connected acyclic graphs or trees. We propose a novel vectorized subgraph counting algorithm, named SUBGRAPH2VEC, as well as both shared memory and distributed implementations: 1) reducing algorithmic complexity by minimizing neighbor traversal; 2) achieving a highly-vectorized implementation upon linear algebra kernels to significantly improve performance and hardware utilization. 3) SUBGRAPH2VEC improves the overall performance over the state-of-the-art work by orders of magnitude and up to 660x on a single node. 4) SUBGRAPH2VEC in distributed mode can scale up the template size to 20 and maintain good strong scalability. 5) enabling portability to both CPU and GPU.

show abstract

“…in a graph. Graphlet counting has a long and rich history, which began with triangle counting and received intense interest in recent years [2,6,7,10,12,15,17,20,21,25,26,27,30]. Since exact graphlet counting is notoriously hard, one must resort to approximate probabilistic counting to obtain algorithms with an acceptable practical performance.…”

Section: Introductionmentioning

confidence: 99%

Motivo

2019

View full text Add to dashboard Cite

The randomized technique of color coding is behind state-ofthe-art algorithms for estimating graph motif counts. Those algorithms, however, are not yet capable of scaling well to very large graphs with billions of edges. In this paper we develop novel tools for the "motif counting via color coding" framework. As a result, our new algorithm, motivo, scales to much larger graphs while at the same time providing more accurate motif counts than ever before. This is achieved thanks to two types of improvements. First, we design new succinct data structures for fast color coding operations, and a biased coloring trick that trades accuracy versus resource usage. These optimizations drastically reduce the resource requirements of color coding. Second, we develop an adaptive motif sampling strategy, based on a fractional set cover problem, that breaks the additive approximation barrier of standard sampling. This gives multiplicative approximations for all motifs at once, allowing us to count not only the most frequent motifs but also extremely rare ones. To give an idea of the improvements, in 40 minutes motivo counts 7-nodes motifs on a graph with 65M nodes and 1.8B edges; this is 30 and 500 times larger than the state of the art, respectively in terms of nodes and edges. On the accuracy side, in one hour motivo produces accurate counts of ⇡ 10.000 distinct 8-node motifs on graphs where state-of-the-art algorithms fail even to find the second most frequent motif. Our method requires just a high-end desktop machine. These results show how color coding can bring motif mining to the realm of truly massive graphs using only ordinary hardware.

show abstract

Subgraph Counting: Color Coding Beyond Trees

Cited by 19 publications

References 31 publications

Motif Counting Beyond Five Nodes

Motif Counting Beyond Five Nodes

SubGraph2Vec: Highly-Vectorized Tree-like Subgraph Counting

Motivo

Contact Info

Product

Resources

About