2015
DOI: 10.1145/2700395
|View full text |Cite
|
Sign up to set email alerts
|

A Space-Efficient Streaming Algorithm for Estimating Transitivity and Triangle Counts Using the Birthday Paradox

Abstract: We design a space-efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox. When the transitivity is constant and there are more edges than wedges (common properties for social networks), we can prove that our algorithm requires O( √ n) space (n is the number of vertices) to provide accurate estimates. We run … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
41
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 47 publications
(42 citation statements)
references
References 47 publications
1
41
0
Order By: Relevance
“…Counting subgraphs in large networks is a well studied problem in data mining which was originally brought to attention in the seminal work of Milo et al [5]. In particular, many contributions in the literature have focused on the triangle counting problem, including exact algorithms, MapReduce algorithms [11,12] and streaming algorithms [6,7,13,14].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Counting subgraphs in large networks is a well studied problem in data mining which was originally brought to attention in the seminal work of Milo et al [5]. In particular, many contributions in the literature have focused on the triangle counting problem, including exact algorithms, MapReduce algorithms [11,12] and streaming algorithms [6,7,13,14].…”
Section: Related Workmentioning
confidence: 99%
“…Using a strategy similar to our TieredSampling approach, in [14] Jha and Seshadri propose a one pass streaming algorithm for triangle counting which using a first reservoir for edges which are then used to generate a stream of wedges (i.e., paths of length two) stored in a second reservoir. This approach appear to be not worthwhile for triangle counting as it is consistently outperformed by a simpler strategy based on a single reservoir presented [6].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Tsourakakis et al [33] proposed triangle sparsifiers to approximate the triangle counts with a single pass of the graph, hence, the technique can also be applied to incremental streams. Pavan et al [28] and Jha et al [20] proposed sampling a set of connected paths of length for approximately counting the triangles in incremental streams. Lim and Kang [27] proposed an algorithm based on Bernoulli sampling of edges for incremental streams, in which the edges are kept in the sample with a fixed user-defined probability.…”
Section: Related Workmentioning
confidence: 99%
“…Biased sampling to make an unbiased estimate of a property of the full graph has recently gained increased attention in the analysis of large graphs. However, its use so far has largely been limited to ascertaining only localized properties of graphs (such as the number of triangles in the graph, the clustering co-efficient or its motif statistics) and not a complex property like the spectral radius [16]- [19].…”
Section: A Subgraph Samplingmentioning
confidence: 99%