In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V, E) we want to estimate as accurately as possible the number of triangles incident to every node v ∈ V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features to assess content quality in social networks.For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of min-wise independent permutations ). Our algorithms operate in a semi-streaming fashion, using O(|V |) space in main memory and performing O(log |V |) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(|E|) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach.
We study the problem of online team formation. We consider a setting in which people possess different skills and compatibility among potential team members is modeled by a social network. A sequence of tasks arrives in an online fashion, and each task requires a specific set of skills. The goal is to form a new team upon arrival of each task, so that (i) each team possesses all skills required by the task, (ii) each team has small communication overhead, and (iii) the workload of performing the tasks is balanced among people in the fairest possible way.We propose efficient algorithms that address all these requirements: our algorithms form teams that always satisfy the required skills, provide approximation guarantees with respect to team communication overhead, and they are online-competitive with respect to load balancing. Experiments performed on collaboration networks among film actors and scientists, confirm that our algorithms are successful at balancing these conflicting requirements. This is the first paper that simultaneously addresses all these aspects. Previous work has either focused on minimizing coordination for a single task or balancing the workload neglecting coordination costs.
We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges.
We study Plurality Consensus in the GOSSIP Model over a network of n anonymous agents. Each agent supports an initial opinion or color. We assume that at the onset, the number of agents supporting the plurality color exceeds that of the agents supporting any other color by a sufficiently-large bias, though the initial plurality itself might be very far from absolute majority. The goal is to provide a protocol that, with high probability, brings the system into the configuration in which all agents support the (initial) plurality color.We consider the Undecided-State Dynamics, a wellknown protocol which uses just one more state (the undecided one) than those necessary to store colors.We show that the speed of convergence of this protocol depends on the initial color configuration as a whole, not just on the gap between the plurality and the second largest color community. This dependence is best captured by a novel notion we introduce, namely, the monochromatic distance md(c) which measures the distance of the initial color configurationc from the closest monochromatic one. In the complete graph, we prove that, for a wide range of the input parameters, this dynamics converges within O(md(c) log n) rounds. We prove that this upper bound is almost tight in the strong sense: Starting from any color configurationc, the convergence time is Ω(md(c)).Finally, we adapt the Undecided-State Dynamics to obtain a fast, random walk-based protocol for plurality consensus on regular expanders. This protocol converges in O(md(c) polylog(n)) rounds using only polylog(n) local memory. A key-ingredient to achieve the above bounds is a new analysis of the maximum node congestion that results from performing n parallel random walks on regular expanders.All our bounds hold with high probability.
We study a plurality-consensus process in which each of n anonymous agents of a communication network initially supports a color chosen from the set [k]. Then, in every round, each agent can revise his color according to the colors currently held by a random sample of his neighbors. It is assumed that the initial color configuration exhibits a sufficiently large biass towards a fixed plurality color, that is, the number of nodes supporting the plurality color exceeds the number of nodes supporting any other color by s additional nodes. The goal is having the process to converge to the stable configuration in which all nodes support the initial plurality. We consider a basic model in which the network is a clique and the update rule (called here the 3-majority dynamics) of the process is the following: each agent looks at the colors of three random neighbors and then applies the majority rule (breaking ties uniformly). We prove that the process converges in time (Formula presented.) with high probability, provided that (Formula presented.). We then prove that our upper bound above is tight as long as (Formula presented.). This fact implies an exponential time-gap between the plurality-consensus process and the median process (see Doerr et al. in Proceedings of the 23rd annual ACM symposium on parallelism in algorithms and architectures (SPAAâ\u80\u9911), pp 149â\u80\u93158. ACM, 2011). A natural question is whether looking at more (than three) random neighbors can significantly speed up the process. We provide a negative answer to this question: in particular, we show that samples of polylogarithmic size can speed up the process by a polylogarithmic factor only
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.