How to learn the embedding vectors of nodes in unsupervised large-scale heterogeneous networks is a key problem in heterogeneous network embedding research. This paper proposes an unsupervised embedding learning model, named LHGI (Large-scale Heterogeneous Graph Infomax). LHGI adopts the subgraph sampling technology under the guidance of metapaths, which can compress the network and retain the semantic information in the network as much as possible. At the same time, LHGI adopts the idea of contrastive learning, and takes the mutual information between normal/negative node vectors and the global graph vector as the objective function to guide the learning process. By maximizing the mutual information, LHGI solves the problem of how to train the network without supervised information. The experimental results show that, compared with the baseline models, the LHGI model shows a better feature extraction capability both in medium-scale unsupervised heterogeneous networks and in large-scale unsupervised heterogeneous networks. The node vectors generated by the LHGI model achieve better performance in the downstream mining tasks.
Evaluation of papers’ academic influence is a hot issue in the field of scientific research management. Academic big data provides a data treasure with the coexistence of different types of academic entities, which can be used to evaluate academic influence from a more macro and comprehensive perspective. Based on academic big data, a heterogeneous academic network composed of links within and between three types of academic entities (authors, papers and venues) is constructed. In addition, a new academic influence ranking algorithm, AIRank, is proposed to evaluate papers’ academic influence. Different from the existing academic influence ranking algorithms, AIRank has made innovations in the following two aspects. (1) AIRank distinguishes the influence transmission intensity between different node pairs. Different from the strategy of evenly distributing influence among different node pairs, AIRank quantifies the intensity of influence transmission between node pairs based on investigating the citation emotional attribute, semantic similarity and academic quality differences between node pairs. Based on the intensity characteristics, AIRank realises the distribution and transmission of influence among different node pairs. (2) AIRank incorporates the influence transmission from heterogeneous neighbours in evaluating papers’ influence. According to the academic influence of author nodes and venue nodes, AIRank fine-tunes the iteration formula of paper influence to obtain the ranking of papers under the joint influence of homogeneous and heterogeneous neighbours. Experimental results show that, compared with the ranking results based on citation frequency and PageRank algorithm, AIRank algorithm can produce more differentiated and reasonable academic influence ranking results.
Network embedding is an effective way to realize the quantitative analysis of large-scale networks. However, mainstream network embedding models are limited by the manually pre-set metapaths, which leads to the unstable performance of the model. At the same time, the information from homogeneous neighbors is mostly focused in encoding the target node, while ignoring the role of heterogeneous neighbors in the node embedding. This paper proposes a new embedding model, HeMGNN, for heterogeneous networks. The framework of the HeMGNN model is divided into two modules: the metapath subgraph extraction module and the node embedding mixing module. In the metapath subgraph extraction module, HeMGNN automatically generates and filters out the metapaths related to domain mining tasks, so as to effectively avoid the excessive dependence of network embedding on artificial prior knowledge. In the node embedding mixing module, HeMGNN integrates the information of homogeneous and heterogeneous neighbors when learning the embedding of the target nodes. This makes the node vectors generated according to the HeMGNN model contain more abundant topological and semantic information provided by the heterogeneous networks. The Rich semantic information makes the node vectors achieve good performance in downstream domain mining tasks. The experimental results show that, compared to the baseline models, the average classification and clustering performance of HeMGNN has improved by up to 0.3141 and 0.2235, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.