Leveraging history for faster sampling of online social networks

ZhouZhuojie,; Zhangnan,; DasGautam,

doi:10.14778/2794367.2794373

Cited by 21 publications

(9 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first one is the use of non-backtracking random walks (NBTRW) [18] which removes the tottering behaviour of random walks. The second one is circulating the neighbors of every node with a vertex specific queue (CNRW) [50]. A third strategy involves teleportsthe random walker jumps (RWJ) [34] with a fixed probability to a random node from the current vertex.…”

Section: Edge Samplingmentioning

confidence: 99%

Little Ball of Fur

Rózemberczki

Kiss²,

Sarkar

2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

Sampling graphs is an important task in data mining. In this paper, we describe Little Ball of Fur a Python library that includes more than twenty graph sampling algorithms. Our goal is to make node, edge, and exploration-based network sampling techniques accessible to a large number of professionals, researchers, and students in a single streamlined framework. We created this framework with a focus on a coherent application public interface which has a convenient design, generic input data requirements, and reasonable baseline settings of algorithms. Here we overview these design foundations of the framework in detail with illustrative code snippets. We show the practical usability of the library by estimating various global statistics of social networks and web graphs. Experiments demonstrate that Little Ball of Fur can speed up node and whole graph embedding techniques considerably with mildly deteriorating the predictive value of distilled features.

show abstract

Section: Edge Samplingmentioning

confidence: 99%

Little Ball of Fur

Rózemberczki

Kiss²,

Sarkar

2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

show abstract

“…Since the query cost is a key factor when to estimate the properties of social networks by employing a random-walk based sampling method, we use the query costs to evaluate the costs of obtaining the individual and social attributes of social networks during the process of sampling. As described in [15], obtaining a node along with its neighbor nodes can be consider one query from social networks. We simulate the query costs over com-DBLP and amazon-0601 as a function of the number of samples when using the five sampling methods to obtain the individual and social attributes.…”

Section: Sampling Costs 1) Query Costmentioning

confidence: 99%

“…Therefore, during the process of SRW, there is a very likely bias in that nodes with higher degrees tend to be more repeatedly sampled than those with lower degrees, resulting in both over-sampled nodes (i.e., of higher degrees) and under-sampled nodes (i.e., of lower degrees), leading to a severe lack of diversity among the samples. Non-backtracking random walk (NBRW), proposed in [3] and Circulated Neighbors random walk (CNRW), proposed in [15] are based on the idea of non-backtracking to a very small fraction of the sampled paths. In this context, a sampling path refers to an edge through which the random walker goes from the current sample to the next.…”

Section: Introductionmentioning

confidence: 99%

2-Hopper: Accurately Estimate Individual and Social Attributes of Social Networks With Fewer Repeats via Random Walk

et al. 2019

View full text Add to dashboard Cite

Random-walk based sampling is widely used to characterize large graphs by producing samples in the form of nodes. However, existing random-walk based sampling methods only focus on the estimation accuracy of structural properties but suffer from repetitive samples which have adverse effects on obtaining accurate information about the structures over social networks represented by large graphs. Furthermore, these existing methods mainly characterize individual attributes while ignoring the social attributes of the nodes. In this paper, a new random-walk based method, called 2-hop neighbors based random walk or 2-Hopper, is proposed to obtain accurate estimations of both basic and social attributes with fewer repetitive samples. Specifically, 2-Hopper is able to greatly reduce redundant paths among nodes during the sampling process and thus produces few repeats. Based on 2-Hopper's sampling process, a re-weighted estimator is proposed to accurately obtain both the individual and social properties while the latter is obtained by a newly proposed algorithm. Experimental results driven by real-world datasets show that on average 2-Hopper can reduce 4.5 times repetitive samples of the state-of-the-art random-walk based methods and obtain more accurate information about the individual and social attributes while 2-Hopper is able to estimate the structural properties of these attributes accurately over large graphs.INDEX TERMS Random-walk based sampling, few repeats, accurate estimations, basic and social attributes of social networks.

show abstract

“…Many techniques have been proposed to improve the efficiency of random walk-based algorithms, for example, nonbacktracking random walk [17], random walk leveraging walk history [40], rejection controlled Metropolis-Hasting random walk [19], random walk with jump [39], etc. In this subsection, we introduce non-backtracking random walk (NB-SRW) to our estimation framework as an example to show how to integrate these techniques with our framework.…”

Section: Non-backtracking Random Walkmentioning

confidence: 99%

A general framework for estimating graphlet statistics via random walk

Chen

Wang

et al. 2016

Proc. VLDB Endow.

View full text Add to dashboard Cite

Graphlets are induced subgraph patterns and have been frequently applied to characterize the local topology structures of graphs across various domains, e.g., online social networks (OSNs) and biological networks. Discovering and computing graphlet statistics are highly challenging. First, the massive size of real-world graphs makes the exact computation of graphlets extremely expensive. Secondly, the graph topology may not be readily available so one has to resort to web crawling using the available application programming interfaces (APIs). In this work, we propose a general and novel framework to estimate graphlet statistics of "any size". Our framework is based on collecting samples through consecutive steps of random walks. We derive an analytical bound on the sample size (via the Chernoff-Hoeffding technique) to guarantee the convergence of our unbiased estimator. To further improve the accuracy, we introduce two novel optimization techniques to reduce the lower bound on the sample size. Experimental evaluations demonstrate that our methods outperform the state-of-the-art method up to an order of magnitude both in terms of accuracy and time cost.

show abstract

Leveraging history for faster sampling of online social networks

Cited by 21 publications

References 25 publications

Little Ball of Fur

Little Ball of Fur

2-Hopper: Accurately Estimate Individual and Social Attributes of Social Networks With Fewer Repeats via Random Walk

A general framework for estimating graphlet statistics via random walk

Contact Info

Product

Resources

About