Hypergraphs are a natural modeling paradigm for networked systems with multiway interactions. A standard task in network analysis is the identification of closely related or densely interconnected nodes. We propose a probabilistic generative model of clustered hypergraphs with heterogeneous node degrees and edge sizes. Approximate maximum likelihood inference in this model leads to a clustering objective that generalizes the popular modularity objective for graphs. From this, we derive an inference algorithm that generalizes the Louvain graph community detection method, and a faster, specialized variant in which edges are expected to lie fully within clusters. Using synthetic and empirical data, we demonstrate that the specialized method is highly scalable and can detect clusters where graph-based methods fail. We also use our model to find interpretable higher-order structure in school contact networks, U.S. congressional bill cosponsorship and committees, product categories in copurchasing behavior, and hotel locations from web browsing sessions.
Graph clustering, or community detection, is the task of identifying groups of closely related objects in a large network. In this paper we introduce a new community-detection framework called Lamb-daCC that is based on a specially weighted version of correlation clustering. A key component in our methodology is a clustering resolution parameter, λ, which implicitly controls the size and structure of clusters formed by our framework. We show that, by increasing this parameter, our objective effectively interpolates between two different strategies in graph clustering: finding a sparse cut and forming dense subgraphs. Our methodology unifies and generalizes a number of other important clustering quality functions including modularity, sparsest cut, and cluster deletion, and places them all within the context of an optimization problem that has been well studied from the perspective of approximation algorithms. Our approach is particularly relevant in the regime of finding dense clusters, as it leads to a 2-approximation for the cluster deletion problem. We use our approach to cluster several graphs, including large collaboration networks and social networks.
The minimum s-t cut problem in graphs is one of the most fundamental problems in combinatorial optimization, and graph cuts underlie algorithms throughout discrete mathematics, theoretical computer science, operations research, and data science. While graphs are a standard model for pairwise relationships, hypergraphs provide the flexibility to model multi-way relationships, and are now a standard model for complex data and systems. However, when generalizing from graphs to hypergraphs, the notion of a "cut hyperedge" is less clear, as a hyperedge's nodes can be split in several ways. Here, we develop a framework for hypergraph cuts by considering the problem of separating two terminal nodes in a hypergraph in a way that minimizes a sum of penalties at split hyperedges. In our setup, different ways of splitting the same hyperedge have different penalties, and the penalty is encoded by what we call a splitting function.Our framework opens a rich space on the foundations of hypergraph cuts. We first identify a natural class of cardinality-based hyperedge splitting functions that depend only on the number of nodes on each side of the split. In this case, we show that the general hypergraph s-t cut problem can be reduced to a tractable graph s-t cut problem if and only if the splitting functions are submodular. We also identify a wide regime of non-submodular splitting functions for which the problem is NP-hard.We also analyze extensions to multiway cuts with at least three terminal nodes and identify a natural class of splitting functions for which the problem can be reduced in an approximation-preserving way to the node-weighted multiway cut problem in graphs, again subject to a submodularity property. Finally, we outline several open questions on general hypergraph cut problems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.