Objects that are interrelated with each other are often represented as homogeneous networks, in which objects are of the same entity type and relationships between objects are of the same relationship type. However, heterogeneous information networks, composed of multiple types of objects and/or relationships, are ubiquitous in real life. Mining heterogeneous information networks is a new and promising field of research in data mining, and clustering is an important way to identify underlying patterns in data. Although clustering on homogeneous networks has been studied for several decades, clustering on heterogeneous networks has been explored only recently. However, some progress has already been made with respect to this theme, ranging from algorithms to various related applications. This paper presents a brief summary of current research regarding heterogeneous network clustering and addresses some promising research directions. First, it presents a formalized definition and two important aspects of heterogeneous information networks to elaborate why clustering on heterogeneous networks is of significance. Then, this review provides a concise classification of existing heterogeneous network clustering algorithms based on their methodological principles. In addition, it discusses experimental developments and applications of heterogeneous network clustering. The paper addresses several open problems and critical issues for future research. WIREs Data Mining Knowl Discov 2014, 4:213–233. doi: 10.1002/widm.1126
This article is categorized under:
Algorithmic Development > Structure Discovery
Technologies > Computational Intelligence
Technologies > Structure Discovery and Clustering