Towards compressing Web graphs

Adler, Micah; Mitzenmacher, Michael

doi:10.1109/dcc.2001.917151

Cited by 148 publications

(115 citation statements)

References 13 publications

Supporting

Mentioning

115

Contrasting

Order By: Relevance

“…The structural connectivity of the Web modeled as the Web graph is an example which presently contains billions of vertices and the number is growing [1]. As a result, compact representation of such graphs for use in various algorithms has been in interest [2,3,4,5]. Planar (and almost planar) graphs which capture various structural artifacts such as road networks, form another example of graphs whose space-efficient representation is crucial due to their massive size.…”

Section: Introductionmentioning

confidence: 99%

Succinct Representations of Separable Graphs

Blelloch

Farzan

2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We consider the problem of highly space-efficient representation of separable graphs while supporting queries in constant time in the RAM with logarithmic word size. In particular, we show constanttime support for adjacency, degree and neighborhood queries. For any monotone class of separable graphs, the storage requirement of the representation is optimal to within lower order terms.Separable graphs are those that admit a O(n c )-separator theorem where c < 1. Many graphs that arise in practice are indeed separable. For instance, graphs with a bounded genus are separable. In particular, planar graphs (genus 0) are separable and our scheme gives the first succinct representation of planar graphs with a storage requirement that matches the information-theory minimum to within lower order terms with constant time support for the queries.We, furthers, show that we can also modify the scheme to succinctly represent the combinatorial planar embedding of planar graphs (and hence encode planar maps).

show abstract

Section: Introductionmentioning

confidence: 99%

Succinct Representations of Separable Graphs

Blelloch

Farzan

2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Most known methods for structural (graph) compression are of heuristic nature. For example, Adler and Mitzenmacher [1] proposed a heuristic method for web graph compression, and similar idea has been used in [21] for compressing sparse graphs. Recently, attention has shifted to grammar compression for data structures: Peshkin [16] proposed an algorithm for a graphical extension of the one-dimensional SE-QUITUR compression method.…”

Section: Introductionmentioning

confidence: 99%

“…1 In 1990, Naor [15] proposed such a representation that is optimal up to the first two leading terms when all unlabeled graphs are equally likely. In this paper, we solve Turan's problem for a larger class of graphs, in particular for the Erdős-Rényi random graphs in which edges are added randomly with probability p. Naor's result is asymptotically a special case of ours when p = 1/2.…”

Section: Introductionmentioning

confidence: 99%

Fast Algorithm for Optimal Compression of Graphs

Choi

2010

2010 Proceedings of the Seventh Workshop on Analytic Algorithmics and Combinatorics (ANALCO)

View full text Add to dashboard Cite

We consider the problem of finding optimal description for general unlabeled graphs. Given a probability distribution on labeled graphs, we introduced in [4] a structural entropy as a lower bound for the lossless compression of such graphs. Specifically, we proved that the structural entropy for the Erdős-Rényi random graph, in which edges are added with probability p, is`n 2´h (p) − n log n + O(n), where n is the number of vertices and h(p) = −p log p − (1 − p) log(1−p) is the entropy rate of a conventional memoryless binary source. In this paper, we prove the asymptotic equipartition property for such graphs. Then, we propose a faster compression algorithm that asymptotically achieves the structural entropy up to the first two leading terms with high probability. Our algorithm runs in O(n + e) time on average where e is the number of edges. To prove its asymptotic optimality, we introduce binary trees that one can classify as in-between tries and digital search trees. We use analytic techniques such as generating functions, Mellin transform, and poissonization to establish our findings. Our experimental results confirm theoretical results and show the usefulness of our algorithm for real-world graphs such as the Internet, biological networks, and social networks. IntroductionBrooks argues in [3] that there is "no theory that gives us a metric for information embodied in structure." Shannon himself alluded to it fifty years earlier in his little known 1953 paper [20]. In fact, Brooks emphasizes the importance of the quantification of information in physical structure. In computer science, however, it is more important to understand the information embodied in abstract structures that are of our particular interests in this paper. For instance, how can we quantify the amount of information in the structure of graphs such as the Internet, social networks, and biological networks? How can we understand and utilize the "structure" of non-conventional data structures such as biological data, topographical maps, medical data, and volumetric data? As the first step to understanding information in such structures, we focus on structure in graphs.

show abstract

“…The WebGraph compression method is indeed the most successful member of a family of approaches to compress Web graphs based on their statistical properties [5,7,1,23,21,20]. It allows fast extraction of the neighbors of a page while spending just a few bits per link (about 2 to 6, depending on the desired navigation performance).…”

Section: Introductionmentioning

confidence: 99%

k2-Trees for Compact Web Graph Representation

Brisaboa

Ladra

Navarro

2009

Lecture Notes in Computer Science

106

View full text Add to dashboard Cite

Abstract. This paper presents a Web graph representation based on a compact tree structure that takes advantage of large empty areas of the adjacency matrix of the graph. Our results show that our method is competitive with the best alternatives in the literature, offering a very good compression ratio (3.3-5.3 bits per link) while permitting fast navigation on the graph to obtain direct as well as reverse neighbors (2-15 microseconds per neighbor delivered). Moreover, it allows for extended functionality not usually considered in compressed graph representations.

show abstract

Towards compressing Web graphs

Cited by 148 publications

References 13 publications

Succinct Representations of Separable Graphs

Succinct Representations of Separable Graphs

Fast Algorithm for Optimal Compression of Graphs

k2-Trees for Compact Web Graph Representation

Contact Info

Product

Resources

About