Nowadays, the problem of community detection has become more and more challenging. With application in a wide range of fields such as sociology, digital marketing, bio-informatics, chemical engineering and computer science, the need for scalable and efficient solutions is strongly underlined. Especially, in the rapidly developed and widespread area of social media where the size of the corresponding networks exceeds the hundreds of millions of vertices in the average case. However, the standard sequential algorithms applications have practically proven not only infeasible but also terribly unscalable due to the excessive computation demands and the overdone resources prerequisites. Therefore, the introduction of compatible distributed machine learning solutions seems the most promising option to tackle this NP-hard class problem. The purpose of this work is to propose a novel distributed community detection methodology, based on the supervised community prediction concept that is extremely scalable, remarkably efficient and circumvent the intrinsic adversities of classic community detection approaches.
Presently, due to the extended availability of gigantic information networks and the beneficial application of graph analysis in various scientific fields, the necessity for efficient and highly scalable community detection algorithms has never been more essential. Despite the significant amount of published research, the existing methods-such as the Girvan-Newman, random-walk edge betweenness, vertex centrality, InfoMap, spectral clustering, etc.-have virtually been proven incapable of handling real-life social graphs due to the intrinsic computational restrictions that lead to mediocre performance and poor scalability. The purpose of this article is to introduce a novel, distributed community detection methodology which in accordance with the community prediction concept, leverages the reduced complexity and the decreased variance of the bagging ensemble methods, to unveil the subjacent community hierarchy. The proposed approach has been thoroughly tested, meticulously compared against different classic community detection algorithms, and practically proven exceptionally scalable, eminently efficient, and promisingly accurate in unfolding the underlying community structure.Information 2020, 11, 199 2 of 15 of community detection's field [2,3] is the evaluation of homophily in social networks, expressed as the identification of the underlying community structure.With a wide range of applications-such as recommendation systems, targeted market analysis, viral marketing, social influence analysis, etc.-community detection has been proven significant for revealing the information networks' inner mechanisms and evolution. However, as clearly demonstrated in [3], there is not a universally accepted definition of the community. Nevertheless, by concentrating on social graph's context, a community can intuitively be defined as the group of vertices which are more densely connected with each other (a.k.a., intra-connected) than connected with the rest of the graph (a.k.a., inter-connected). Thus, community detection can alternatively be interpreted as the edges' classification to either intra-connection, which are the edges linking vertices of the same community, or inter-connection that are the edges linking vertices of different communities [3,4].As thoroughly described in [2-10], due to community detection's widespread application, ample research has already been conducted to efficiently unveil the information networks' subjacent community structure. The classic community detection algorithms are originally designed to be generally applied to any information network. These techniques-such as the Girvan-Newman [6] algorithm, edge centrality [7] method, geodesic edge betweenness [2] approach, Kernighan-Lin algorithm [2], Latora-Marchiori [2] algorithm, etc.-are basically recursive methods of high polynomial computational complexity that predominantly leverage an iteratively modified and repetitively calculated set of global network topology metrics, in order to extract the underlying community hierarchy from any possib...
Nowadays, due to the extensive use of information networks in a broad range of fields, e.g., bio-informatics, sociology, digital marketing, computer science, etc., graph theory applications have attracted significant scientific interest. Due to its apparent abstraction, community detection has become one of the most thoroughly studied graph partitioning problems. However, the existing algorithms principally propose iterative solutions of high polynomial order that repetitively require exhaustive analysis. These methods can undoubtedly be considered resource-wise overdemanding, unscalable, and inapplicable in big data graphs, such as today’s social networks. In this article, a novel, near-linear, and highly scalable community prediction methodology is introduced. Specifically, using a distributed, stacking-based model, which is built on plain network topology characteristics of bootstrap sampled subgraphs, the underlined community hierarchy of any given social network is efficiently extracted in spite of its size and density. The effectiveness of the proposed methodology has diligently been examined on numerous real-life social networks and proven superior to various similar approaches in terms of performance, stability, and accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.