The statistical inference of stochastic block models as emerged as a mathematicaly principled method for identifying communities inside networks. Its objective is to find the node partition and the block-to-block adjacency matrix of maximum likelihood i.e. the one which has most probably generated the observed network. In practice, in the so-called microcanonical ensemble, it is frequently assumed that when comparing two models which have the same number and sizes of communities, the best one is the one of minimum entropy i.e. the one which can generate the less different networks. In this paper, we show that there are situations in which the minimum entropy model does not identify the most significant communities in terms of edge distribution, even though it generates the observed graph with a higher probability.
Community detection in graphs often relies on ad hoc algorithms with no clear specification about the node partition they define as the best, which leads to uninterpretable communities. Stochastic block models (SBM) offer a framework to rigorously define communities, and to detect them using statistical inference method to distinguish structure from random fluctuations. In this paper, we introduce an alternative definition of SBM based on edge sampling. We derive from this definition a quality function to statistically infer the node partition used to generate a given graph. We then test it on synthetic graphs, and on the zachary karate club network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.