Subgraph enumeration aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph. As the subgraph isomorphism operation is computationally intensive, researchers have recently focused on solving this problem in distributed environments, such as MapReduce and Pregel. Among them, the state-of-the-art algorithm, Twin TwigJoin, is proven to be instance optimal based on a left-deep join framework. However, it is still not scalable to large graphs because of the constraints in the left-deep join framework and that each decomposed component (join unit) must be a star. In this paper, we propose SEED - a scalable sub-graph enumeration approach in the distributed environment. Compared to Twin TwigJoin, SEED returns optimal solution in a generalized join framework without the constraints in Twin TwigJoin. We use both star and clique as the join units, and design an effective distributed graph storage mechanism to support such an extension. We develop a comprehensive cost model, that estimates the number of matches of any given pattern graph by considering power-law degree distribution in the data graph. We then generalize the left-deep join framework and develop a dynamic-programming algorithm to compute an optimal bushy join plan. We also consider overlaps among the join units. Finally, we propose clique compression to further improve the algorithm by reducing the number of the intermediate results. Extensive performance studies are conducted on several real graphs, one containing billions of edges. The results demonstrate that our algorithm outperforms all other state-of-the-art algorithms by more than one order of magnitude.
Efficiently computing k-edge connected components in a large graph, G = (V, E), where V is the vertex set and E is the edge set, is a long standing research problem. It is not only fundamental in graph analysis but also crucial in graph search optimization algorithms. Consider existing techniques for computing k-edge connected components are quite time consuming and are unlikely to be scalable for large scale graphs, in this paper we firstly propose a novel graph decomposition paradigm to iteratively decompose a graph G for computing its k-edge connected components such that the number of drilling-down iterations h is bounded by the "depth" of the k-edge connected components nested together to form G, where h usually is a small integer in practice. Secondly, we devise a novel, efficient threshold-based graph decomposition algorithm, with time complexity O(l × |E|), to decompose a graph G at each iteration, where l usually is a small integer with l ≪ |V|. As a result, our algorithm for computing k-edge connected components significantly improves the time complexity of an existing state-of-the-art technique from O(|V| 2 |E| + |V| 3 log |V|) to O(h × l × |E|). Finally, we conduct extensive performance studies on large real and synthetic graphs. The performance studies demonstrate that our techniques significantly outperform the state-of-the-art solution by several orders of magnitude.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.