This paper examines important factors for link prediction in networks and provides a general, high-performance framework for the prediction task. Link prediction in sparse networks presents a significant challenge due to the inherent disproportion of links that can form to links that do form. Previous research has typically approached this as an unsupervised problem. While this is not the first work to explore supervised learning, many factors significant in influencing and guiding classification remain unexplored. In this paper, we consider these factors by first motivating the use of a supervised framework through a careful investigation of issues such as network observational period, generality of existing methods, variance reduction, topological causes and degrees of imbalance, and sampling approaches. We also present an effective flow-based predicting algorithm, offer formal bounds on imbalance in sparse network link prediction, and employ an evaluation method appropriate for the observed imbalance. Our careful consideration of the above issues ultimately leads to a completely general framework that outperforms unsupervised link prediction methods by more than 30% AUC.
Abstract. Knowledge-sharing online social networks are becoming increasingly pervasive and popular. While the user-to-user interactions in these networks have received substantial attention, the consumption of user generated content has not been studied extensively. In this work, we use data gathered from digg.com to present novel findings and draw important sociological conclusions regarding the intimate relationship between consumption and social networking. We first demonstrate that individuals' consumption habits influence their friend networks, consistent with the concept of homophily. We then show that one's social network can also influence the consumption of a submission through the activation of an extended friend network. Finally, we investigate the level of reciprocity, or balance, in the network and uncover relationships that are significantly less balanced than expected.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.