Online Sampling of High Centrality Individuals in Social Networks

Maiya, Arun S.; Berger-Wolf, Tanya Y.

doi:10.1007/978-3-642-13657-3_12

Cited by 52 publications

(35 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The algorithm computes, for each vertex v, the shortest path to every other vertex and then traverses these paths backwards to efficiently compute the contribution of the shortest paths from v to the betweenness of other vertices. For very large networks, the cost of this algorithm would still be prohibitive in practice, so many approximation algorithms were developed (Jacob et al 2005;Brandes and Pich 2007;Bader et al 2007;Geisberger et al 2008;Maiya and Berger-Wolf 2010;Lim et al 2011). The use of random sampling was one of the more natural approaches to speed up the computation of betweenness.…”

Section: Related Workmentioning

confidence: 99%

“…Bader et al (2007) present an adaptive sampling algorithm which computes good estimations for the betweenness of high-centrality vertices, by keeping track of the partial contribution of each sampled vertex, obtained by performing a single-source shortest paths computation to all other vertices. Maiya and Berger-Wolf (2010) use concepts from expander graphs to select a connected sample of vertices. They estimate the betweenness from the sample, which includes the vertices with high centrality.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Fast approximation of betweenness centrality through sampling

Riondato

Kornaropoulos

2015

Data Min Knowl Disc

156

214

View full text Add to dashboard Cite

Betweenness centrality is a fundamental measure in social network analysis, expressing the importance or influence of individual vertices (or edges) in a network in terms of the fraction of shortest paths that pass through them. Since exact computation in large networks is prohibitively expensive, we present two efficient randomized algorithms for betweenness estimation. The algorithms are based on random sampling of shortest paths and offer probabilistic guarantees on the quality of the approximation. The first algorithm estimates the betweenness of all vertices (or edges): all approximate values are within an additive factor ε ∈ (0, 1) from the real values, with probability at least 1 − δ. The second algorithm focuses on the top-K vertices (or edges) with highest betweenness and estimate their betweenness value to within a multiplicative factor ε, with probability at least 1 − δ. This is the first algorithm that can compute such approximation for the top-K vertices (or edges). By proving upper and lower bounds to the VC-dimension of a range set associated with the problem at hand, we can bound the sample size needed to achieve the desired approximations. We obtain sample sizes that are independent from the number of vertices in the network and only depend on a characteristic quantity that we call the vertex-diameter, that is the maximum number of vertices in a shortest path. In some cases, the sample size is completely independent from any quantitative property of the graph. An extensive experimental evaluation on real and artificial networks shows that our algorithms are significantly faster and much more scalable as the number of vertices grows than other algorithms with similar approximation guarantees.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Fast approximation of betweenness centrality through sampling

Riondato

Kornaropoulos

2015

Data Min Knowl Disc

156

214

View full text Add to dashboard Cite

show abstract

“…In [20], the authors propose the Expansion Sampling algorithm which constructs a sample subgraph through maximal expansion-a neighboring node of the current sample is selected into the sample if it has the most number of neighbors that are neither within the current sample nor are neighbors of the nodes in the current sample. In [21], the authors propose the BackLink Count (BLC) algorithm which includes the node that is most connected to the current sample.…”

Section: A Subgraph Samplingmentioning

confidence: 99%

“…Another approach to extracting a subgraph for estimating the spectral radius is one based on finding the set of nodes that have the largest eigenvalue centrality within the network [20], [21]. In [20], the authors propose the Expansion Sampling algorithm which constructs a sample subgraph through maximal expansion-a neighboring node of the current sample is selected into the sample if it has the most number of neighbors that are neither within the current sample nor are neighbors of the nodes in the current sample.…”

Section: A Subgraph Samplingmentioning

confidence: 99%

On estimating the Spectral Radius of large graphs through subgraph sampling

Chu

Sethu

2015

2015 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)

View full text Add to dashboard Cite

Extremely large graphs, such as those representing the Web or online social networks, require prohibitively large computational resources for an analysis of any of their complex properties. In this paper, we investigate an algorithmic approach to overcoming this difficulty by inferring key properties of the full graph using a strategic sample of small subgraphs of the graph. We focus, in particular, on the spectral radius (the largest eigenvalue of the adjacency matrix) of the graph because of its relationship to multiple highly relevant properties of graphs. We describe the Spectral Radius Estimator (SRE), a new greedy algorithm based on adding nodes with high estimated eigenvalue centrality into sample subgraphs. We present results on the performance of the SRE on real-world graphs and show that it estimates the spectral radius of real graphs with 98% to 99.99% accuracy using a subgraph of size less than about 4% of the full graph. This work demonstrates the feasibility and the potential of subgraph sampling as a computationally cheap means of inferring complex properties of extremely large graphs.

show abstract

“…In this setting the concept of hub has been widely studied, and is at the basis of many important applications, ranging from analysis of the structure of the Internet to web searches, from peer-to-peer network analysis to social networks, from Viral Marketing to analysis of the Blogosphere, from outbreaks of epidemics to metabolic network analysis [4,14,1,13,11,24,15,17].…”

Section: Introductionmentioning

confidence: 99%

The pursuit of hubbiness: Analysis of hubs in large multidimensional networks

Berlingerio

Coscia

Giannotti

et al. 2011

Journal of Computational Science

View full text Add to dashboard Cite

a b s t r a c tHubs are highly connected nodes within a network. In complex network analysis, hubs have been widely studied, and are at the basis of many tasks, such as web search and epidemic outbreak detection. In reality, networks are often multidimensional, i.e., there can exist multiple connections between any pair of nodes. In this setting, the concept of hub depends on the multiple dimensions of the network, whose interplay becomes crucial for the connectedness of a node. In this paper, we characterize multidimensional hubs. We consider the multidimensional generalization of the degree and introduce a new class of measures, that we call Dimension Relevance, aimed at analyzing the importance of different dimensions for the hubbiness of a node. We assess the meaningfulness of our measures by comparing them on real networks and null models, then we study the interplay among dimensions and their effect on node connectivity. Our findings show that: (i) multidimensional hubs do exist and their characterization yields interesting insights and (ii) it is possible to detect the most influential dimensions that cause the different hub behaviors. We demonstrate the usefulness of multidimensional analysis in three real world domains: detection of ambiguous query terms in a word-word query log network, outlier detection in a social network, and temporal analysis of behaviors in a co-authorship network.

show abstract

Online Sampling of High Centrality Individuals in Social Networks

Cited by 52 publications

References 20 publications

Fast approximation of betweenness centrality through sampling

Fast approximation of betweenness centrality through sampling

On estimating the Spectral Radius of large graphs through subgraph sampling

The pursuit of hubbiness: Analysis of hubs in large multidimensional networks

Contact Info

Product

Resources

About