In stochastic block models, which are among the most prominent statistical models for cluster analysis of complex networks, clusters are defined as groups of nodes with statistically similar link probabilities within and between groups. A recent extension by Karrer and Newman [Karrer and Newman, Phys. Rev. E 83, 016107 (2011)] incorporates a node degree correction to model degree heterogeneity within each group. Although this demonstrably leads to better performance on several networks, it is not obvious whether modeling node degree is always appropriate or necessary. We formulate the degree corrected stochastic block model as a nonparametric Bayesian model, incorporating a parameter to control the amount of degree correction that can then be inferred from data. Additionally, our formulation yields principled ways of inferring the number of groups as well as predicting missing links in the network that can be used to quantify the model's predictive performance. On synthetic data we demonstrate that including the degree correction yields better performance on both recovering the true group structure and predicting missing links when degree heterogeneity is present, whereas performance is on par for data with no degree heterogeneity within clusters. On seven real networks (with no ground truth group structure available) we show that predictive performance is about equal whether or not degree correction is included; however, for some networks significantly fewer clusters are discovered when correcting for degree, indicating that the data can be more compactly explained by clusters of heterogenous degree nodes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.