We consider the problem of finding optimal description for general unlabeled graphs. Given a probability distribution on labeled graphs, we introduced in [4] a structural entropy as a lower bound for the lossless compression of such graphs. Specifically, we proved that the structural entropy for the Erdős-Rényi random graph, in which edges are added with probability p, is`n 2´h (p) − n log n + O(n), where n is the number of vertices and h(p) = −p log p − (1 − p) log(1−p) is the entropy rate of a conventional memoryless binary source. In this paper, we prove the asymptotic equipartition property for such graphs. Then, we propose a faster compression algorithm that asymptotically achieves the structural entropy up to the first two leading terms with high probability. Our algorithm runs in O(n + e) time on average where e is the number of edges. To prove its asymptotic optimality, we introduce binary trees that one can classify as in-between tries and digital search trees. We use analytic techniques such as generating functions, Mellin transform, and poissonization to establish our findings. Our experimental results confirm theoretical results and show the usefulness of our algorithm for real-world graphs such as the Internet, biological networks, and social networks.
IntroductionBrooks argues in [3] that there is "no theory that gives us a metric for information embodied in structure." Shannon himself alluded to it fifty years earlier in his little known 1953 paper [20]. In fact, Brooks emphasizes the importance of the quantification of information in physical structure. In computer science, however, it is more important to understand the information embodied in abstract structures that are of our particular interests in this paper. For instance, how can we quantify the amount of information in the structure of graphs such as the Internet, social networks, and biological networks? How can we understand and utilize the "structure" of non-conventional data structures such as biological data, topographical maps, medical data, and volumetric data? As the first step to understanding information in such structures, we focus on structure in graphs.