In this paper, community detection in hypergraphs is explored. Under a generative hypergraph model called "d-wise hypergraph stochastic block model" (d-hSBM) which naturally extends the Stochastic Block Model (SBM) from graphs to d-uniform hypergraphs, the asymptotic minimax mismatch ratio (the risk function in the community detection problem) is characterized. For proving the achievability, we propose a two-step polynomial time algorithm that provably achieves the fundamental limit. The first step of the algorithm is a hypergraph spectral clustering method which achieves partial recovery to a certain precision level. The second step is a local refinement method which leverages the underlying probabilistic model along with parameter estimation from the outcome of the first step. To characterize the asymptotic performance of the proposed algorithm, we first derive a sufficient condition for attaining weak consistency in the hypergraph spectral clustering step. Then, under the guarantee of weak consistency in the first step, we upper bound the worst-case risk attained in the local refinement step by an exponentially decaying function of the size of the hypergraph and characterize the decaying rate. Compared to the existing works in SBM, the main technical challenge lies in the complex structure of error events since community relations become much more complicated. We tackle the challenge by a series of non-trivial generalizations. To ensure the optimality of the refinement algorithm, all possible misclassification patterns among a wealth of community relations should be considered and treated with care when concentration inequalities are applied. Moreover, in proving the performance guarantee of the spectral clustering algorithm, we have to deal with random matrices whose entries are no longer independent as opposed to its graph counterpart. For proving the converse, the lower bound of the minimax mismatch ratio is set by finding a smaller parameter space which contains the most dominant error events, inspired by the analysis in the achievability part. It turns out that the minimax mismatch ratio decays exponentially fast to zero as the number of nodes tends to infinity, and the rate function is a weighted combination of several divergence terms, each of which is the Rényi divergence of order 1/2 between two Bernoulli distributions. The Bernoulli distributions involved in the characterization of the rate function are those governing the random instantiation of hyperedges in d-hSBM. Experimental results on synthetic data validate our theoretical finding that the refinement step is critical in achieving the optimal statistical limit. I (Eli) Chien was with the
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.