Guyue Han scite author profile

2016

Algorithms for mining very large graphs, such as those representing online social networks, to discover the relative frequency of small subgraphs within them are of high interest to sociologists, computer scientists and marketeers alike. However, the computation of these network motif statistics via naive enumeration is infeasible for either its prohibitive computational costs or access restrictions on the full graph data. Methods to estimate the motif statistics based on random walks by sampling only a small fraction of the subgraphs in the large graph address both of these challenges. In this paper, we present a new algorithm, called the Waddling Random Walk (WRW), which estimates the concentration of motifs of any size. It derives its name from the fact that it sways a little to the left and to the right, thus also sampling nodes not directly on the path of the random walk. The WRW algorithm achieves its computational efficiency by not trying to enumerate subgraphs around the random walk but instead using a randomized protocol to sample subgraphs in the neighborhood of the nodes visited by the walk. In addition, WRW achieves significantly higher accuracy (measured by the closeness of its estimate to the correct value) and higher precision (measured by the low variance in its estimations) than the current state-of-the-art algorithms for mining subgraph statistics. We illustrate these advantages in speed, accuracy and precision using simulations on well-known and widely used graph datasets representing real networks. [13]. For example, the clustering coefficient (the number of triangles in relation to the number of wedges) has long served as an important metric in sociometry and social network analysis [14], [15]. In fact, the relative frequencies of network motifs are indicative of important properties of graphs such as modularity, the tendency of nodes in a network to form tightly interconnected communities, and even play a role in the organization and evolution of networks [6]. Knowledge of these motif statistics combined with homophily, the tendency of similar nodes to connect to one another, add to the ability of businesses such as Facebook to better mine their graphs and monetize their social platforms through targeted advertisements [16].Computing motif statistics, however, is rendered difficult by two challenges: one computational and the other having to do with restricted access to the full graph data. The computational challenge arises because accurate computation of the relative frequencies of different motifs requires enumeration of all the induced subgraphs and checking each for isomorphism to known motif types. The time complexity of enumerating all induced subgraphs of size k in a graph with V vertices and E edges is exponential in k with an upper bound of O(E k ) and a lower bound of O(V c k−1 ) [17]. Even when k is as small as 4, in a graph with only millions of edges, the number of motifs can reach hundreds of billions. The other problem is one of restricted access because the data on...

show abstract

On Counting Triangles Through Edge Sampling in Large Dynamic Graphs

2019

Traditional frameworks for dynamic graphs have relied on processing only the stream of edges added into or deleted from an evolving graph, but not any additional related information such as the degrees or neighbor lists of nodes incident to the edges. In this paper, we propose a new edge sampling framework for big-graph analytics in dynamic graphs which enhances the traditional model by enabling the use of additional related information. To demonstrate the advantages of this framework, we present a new sampling algorithm, called Edge Sample and Discard (esd). It generates an unbiased estimate of the total number of triangles, which can be continuously updated in response to both edge additions and deletions. We provide a comparative analysis of the performance of esd against two current state-of-the-art algorithms in terms of accuracy and complexity. The results of the experiments performed on real graphs show that, with the help of the neighborhood information of the sampled edges, the accuracy achieved by our algorithm is substantially better. We also characterize the impact of properties of the graph on the performance of our algorithm by testing on several Barabási-Albert graphs.

show abstract

Closed walk sampler: An efficient method for estimating the spectral radius of large graphs

2017

Waddling Random Walk: Fast and Accurate Mining of Motif Statistics in Large Graphs

2016

Preprint

Closed Walk Sampler: An Efficient Method for Estimating Eigenvalues of Large Graphs

2020

IEEE Trans. Big Data