This paper presents a new space-efficient algorithm for counting and sampling triangles-and more generally, constant-sized cliques-in a massive graph whose edges arrive as a stream. Compared to prior work, our algorithm yields significant improvements in the space and time complexity for these fundamental problems. Our algorithm is simple to implement and has very good practical performance on large graphs.
Abstract. Efficient one-pass estimation of F 0 , the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider rangeefficient estimation of F 0 : estimation of the number of distinct elements in a data stream where each element of the stream is not just a single integer but an interval of integers. We present a randomized algorithm which yields an ( , δ)-approximation of F 0 , with the following time and space complexities (n is the size of the universe of the items): (1)
The number of triangles in a graph is a fundamental metric, used in social network analysis, link classification and recommendation, and more. Driven by these applications and the trend that modern graph datasets are both large and dynamic, we present the design and implementation of a fast and cache-efficient parallel algorithm for estimating the number of triangles in a massive undirected graph whose edges arrive as a stream. It brings together the benefits of streaming algorithms and parallel algorithms. By building on the streaming algorithms framework, the algorithm has a small memory footprint. By leveraging the paralell cache-oblivious framework, it makes efficient use of the memory hierarchy of modern multicore machines without needing to know its specific parameters. We prove theoretical bounds on accuracy, memory access cost, and parallel runtime complexity, as well as showing empirically that the algorithm yields accurate results and substantial speedups compared to an optimized sequential implementation.(This is an expanded version of a CIKM'13 paper of the same title.)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.