2015
DOI: 10.14778/2794367.2794373
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging history for faster sampling of online social networks

Abstract: With a vast amount of data available on online social networks, how to enable efficient analytics has been an increasingly important research problem. Many existing studies resort to sampling techniques that draw random nodes from an online social network through its restrictive web/API interface. While almost all of these techniques use the exact same underlying technique of random walk -a Markov Chain Monte Carlo based method that iteratively transits from one node to its random neighbor.Random walk fits nat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(9 citation statements)
references
References 25 publications
0
9
0
Order By: Relevance
“…The first one is the use of non-backtracking random walks (NBTRW) [18] which removes the tottering behaviour of random walks. The second one is circulating the neighbors of every node with a vertex specific queue (CNRW) [50]. A third strategy involves teleportsthe random walker jumps (RWJ) [34] with a fixed probability to a random node from the current vertex.…”
Section: Edge Samplingmentioning
confidence: 99%
“…The first one is the use of non-backtracking random walks (NBTRW) [18] which removes the tottering behaviour of random walks. The second one is circulating the neighbors of every node with a vertex specific queue (CNRW) [50]. A third strategy involves teleportsthe random walker jumps (RWJ) [34] with a fixed probability to a random node from the current vertex.…”
Section: Edge Samplingmentioning
confidence: 99%
“…Since the query cost is a key factor when to estimate the properties of social networks by employing a random-walk based sampling method, we use the query costs to evaluate the costs of obtaining the individual and social attributes of social networks during the process of sampling. As described in [15], obtaining a node along with its neighbor nodes can be consider one query from social networks. We simulate the query costs over com-DBLP and amazon-0601 as a function of the number of samples when using the five sampling methods to obtain the individual and social attributes.…”
Section: Sampling Costs 1) Query Costmentioning
confidence: 99%
“…Therefore, during the process of SRW, there is a very likely bias in that nodes with higher degrees tend to be more repeatedly sampled than those with lower degrees, resulting in both over-sampled nodes (i.e., of higher degrees) and under-sampled nodes (i.e., of lower degrees), leading to a severe lack of diversity among the samples. Non-backtracking random walk (NBRW), proposed in [3] and Circulated Neighbors random walk (CNRW), proposed in [15] are based on the idea of non-backtracking to a very small fraction of the sampled paths. In this context, a sampling path refers to an edge through which the random walker goes from the current sample to the next.…”
Section: Introductionmentioning
confidence: 99%
“…Many techniques have been proposed to improve the efficiency of random walk-based algorithms, for example, nonbacktracking random walk [17], random walk leveraging walk history [40], rejection controlled Metropolis-Hasting random walk [19], random walk with jump [39], etc. In this subsection, we introduce non-backtracking random walk (NB-SRW) to our estimation framework as an example to show how to integrate these techniques with our framework.…”
Section: Non-backtracking Random Walkmentioning
confidence: 99%