This paper studies how relationships form in heterogeneous information networks (HINs). The objective is not only to predict relationships in a given HIN more accurately but also to discover the interdependency between different type of relationships. A new relationship prediction method MULRP based on multilabel learning (MLL in brief) is proposed. In MULRP, the types of relationship between two nodes are represented by the meta-paths between nodes and each type of relationship is given a label. Under the framework of MLL, any potential relationships including the target relationship can be predicted. Moreover, the method can output the reasonable dependency scores between relationships. Thus, more viable paths will be provided to facilitate the formation of new relationships. The proposed method is evaluated on two real datasets: The DBLP Computer Science Bibliography(abbr. DBLP) network and Twitter network. The experimental results show that by using heterogeneous information in a supervised MLL setting, MULRP achieves better performance in comparison to several baseline binary classification methods and a state-of-art relationship prediction method.
KEYWORDSheterogeneous information networks, meta-path, multilabel learning, relationship prediction
INTRODUCTIONMany complex systems in real world can be formalized as networks, where nodes represent objects and links represent interactions between objects [15]. Most of these networks are heterogeneous, which contain various type of objects and relations. For example, in the online social network (OSN) Twitter, there are different types of nodes like users, locations and tweets, and different types of links like write/written, follow/followed, check-in/checked-in, etc. As a key subtask in link mining and social network analysis, link prediction aims to predict the formation of links in future based on the current or historical network [14]. It has wide application in bibliographic networks, biological networks, OSNs, recommendation systems and so on. Link prediction can be regarded as a simple binary classification problem: For any two unconnected objects, predict whether the link exists (with a positive label) or not (with a negative label). The prediction methods can be based on structural properties of the network [22] or the attributes of nodes.Many of the previous link prediction methods are designed for homogeneous information networks where all nodes or links are of the same type. These networks are usually the simplification of real interacting systems by ignoring its heterogeneity. For example, the co-authorship network only contains the author object and the co-author relationship. It is actually derived from a bibliographic network like DBLP,