Community detection has arisen as one of the most relevant topics in the field of graph mining, principally for its applications in domains such as social or biological networks analysis. Different community detection algorithms have been proposed during the last decade, approaching the problem from different perspectives. However, existing algorithms are, in general, based on complex and expensive computations, making them unsuitable for large graphs with millions of vertices and edges such as those usually found in the real world.\ud In this paper, we propose a novel disjoint community detection algorithm called Scalable Community Detection (SCD). By combining different strategies, SCD partitions the graph by maximizing the Weighted Community Clustering (WCC), a recently proposed community detection metric based on triangle analysis. Using real graphs with ground truth overlapped communities, we show that SCD outperforms the current state of the art proposals (even those aimed at finding overlapping communities) in terms of quality and performance. SCD provides the speed of the fastest algorithms and the quality in terms of NMI and F1Score of the most accurate state of the art proposals. We show that SCD is able to run up to two orders of magnitude faster than practical existing solutions by exploiting the parallelism of current multi-core processors, enabling us to process graphs of unprecedented size in short execution times.Peer ReviewedPostprint (published version
Community detection has arisen as one of the most relevant topics in the field of graph data mining due to its importance in many fields such as biology, social networks or network traffic analysis. The metrics proposed to shape communities are generic and follow two approaches: maximizing the internal density of such communities or reducing the connectivity of the internal vertices with those outside the community. However, these metrics take the edges as a set and do not consider the internal layout of the edges in the community. We define a set of properties oriented to social networks that ensure that communities are cohesive, structured and well defined. Then, we propose the Weighted Community Clustering (W CC), which is a community metric based on triangles. We proof that analyzing communities by triangles gives communities that fulfill the listed set of properties, in contrast to previous metrics. Finally, we experimentally show that WCC correctly captures the concept of community in social networks using real and syntethic datasets, and compare statistically some of the most relevant community detection algorithms in the state of the art.
Graphs have become an indispensable tool for the analysis of linked data. As with any data representation, the need for using database management systems appears when they grow in size and complexity. Associated to those needs, benchmarks appear to assess the performance of such systems in specific scenarios, representative of real use cases.In this paper we propose a microbenchmark based on social networks. This includes a data generator that synthetically creates social graphs, and a set of low level atomic queries that model parts of the behavior of social network users. In order to understand how different data management paradigms are stressed, we execute the benchmark over five different database systems representing graph (Dex and Neo4j), RDF (RDF-3X) and relational (Virtuoso and PostgreSQL) data management. We conclude that reachability queries are those that put all the database systems into more difficulties, justifying themselves, and making them good candidates for more complex benchmarks.
In this short paper, we provide an early look at the LDBC Social Network Benchmark's Business Intelligence (BI) workload which tests graph data management systems on a graph business analytics workload. Its queries involve complex aggregations and navigations (joins) that touch large data volumes, which is typical in BI workloads, yet they depend heavily on graph functionality such as connectivity tests and path finding. We outline the motivation for this new benchmark, which we derived from many interactions with the graph database industry and its users, and situate it in a scenario of social network analysis. The workload was designed by taking into account technical "chokepoints" identified by database system architects from academia and industry, which we also describe and map to the queries. We present reference implementations in openCypher, PGQL, SPARQL, and SQL, and preliminary results of SNB BI on a number of graph data management systems.
In this paper we introduce LDBC Graphalytics, a new industrial-grade benchmark for graph analysis platforms. It consists of six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the objective comparison of graph analysis platforms. Its test harness produces deep metrics that quantify multiple kinds of system scalability, such as horizontal/vertical and weak/strong, and of robustness, such as failures and performance variability. The benchmark comes with open-source software for generating data and monitoring performance. We describe and analyze six implementations of the benchmark (three from the community, three from the industry), providing insights into the strengths and weaknesses of the platforms. Key to our contribution, vendors perform the tuning and benchmarking of their platforms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.