This paper targets on designing a query-based dataset recommendation system, which accepts a query denoting a user's research interest as a set of research papers and returns a list of recommended datasets that are ranked by the potential usefulness for the user's research need. The motivation of building such a system is to save users from spending time on heavy literature review work to find usable datasets. We start by constructing a two-layer network: one layer of citation network, and the other layer of datasets, connected to the firstlayer papers in which they were used. A query highlights a set of papers in the citation layer. However, answering the query as a naive retrieval of datasets linked with these highlighted papers excludes other semantically relevant datasets, which widely exist several hops away from the queried papers. We propose to learn representations of research papers and datasets in the two-layer network using heterogeneous variational graph autoencoder, and then compute the relevance of the query to the dataset candidates based on the learned representations. Our ranked datasets shown in extensive evaluation results are validated to be more truly relevant than those obtained by naive retrieval methods and adoptions of existing related solutions.
In this paper, we study an automatic hypothesis generation (HG) problem, which refers to the discovery of meaningful implicit connections between scientific terms, including but not limited to diseases, chemicals, drugs, and genes extracted from databases of biomedical publications. Most prior studies of this problem focused on the use of static information of terms and largely ignored the temporal dynamics of scientific term relations. Even when the dynamics were considered in a few recent studies, they learned the representations for the scientific terms, rather than focusing on the term-pair relations. Since the HG problem is to predict term-pair connections, it is not enough to know with whom the terms are connected, it is more important to know how the connections have been formed (in a dynamic process). We formulate this HG problem as a future connectivity prediction in a dynamic attributed graph. The key is to capture the temporal evolution of node-pair (term-pair) relations. We propose an inductive edge (node-pair) embedding method named T-PAIR, utilizing both the graphical structure and node attribute to encode the temporal node-pair relationship. We demonstrate the efficiency of the proposed model on three real-world datasets, which are three graphs constructed from Pubmed papers published until 2019 in Neurology, Immunotherapy, and Virology, respectively. Evaluations were conducted on predicting future term-pair relations between millions of seen terms (in the transductive setting), as well as on the relations involving unseen terms (in the inductive setting). Experiment results and case study analyses show the effectiveness of the proposed model.
Finding popular datasets to work on is essential for data-driven research domains. In this paper, we focus on the problem of extracting top-k popular datasets that have been used in data mining, machine learning, and artificial intelligence fields. We solve this problem on an attributed citation network, which includes node content information (text of published papers) and paper citation relations. By formulating the problem as a semi-supervised multi-label classification one, we develop an efficient deep generative model for learning from both the document content and citation relations. The evaluation on a real-world dataset shows that our proposed model outperforms baseline methods. We then apply the model further to reveal the top-k frequently cited datasets in selected areas and report interesting findings.
In this work, we study semi-supervised multi-label node classification problem in attributed graphs. Classic solutions to multi-label node classification follow two steps, first learn node embedding and then build a node classifier on the learned embedding. To improve the discriminating power of the node embedding, we propose a novel collaborative graph walk, named Multi-Label-Graph-Walk, to finely tune node representations with the available label assignments in attributed graphs via reinforcement learning. The proposed method formulates the multi-label node classification task as simultaneous graph walks conducted by multiple label-specific agents. Furthermore, policies of the label-wise graph walks are learned in a cooperative way to capture first the predictive relation between node labels and structural attributes of graphs; and second, the correlation among the multiple label-specific classification tasks. A comprehensive experimental study demonstrates that the proposed method can achieve significantly better multi-label classification performance than the state-of-the-art approaches and conduct more efficient graph exploration.
In this paper, we study the graph-based semi-supervised learning for classifying nodes in a ributed networks, where the nodes and edges possess content information. Recent approaches like graph convolution networks and a ention mechanisms have been proposed to ensemble the rst-order neighbors and incorporate the relevant neighbors. However, it is costly (especially in memory) to consider all neighbors without a prior di erentiation. We propose to explore the neighborhood in a reinforcement learning se ing and nd a walk path well-tuned for classifying the unlabelled target nodes. We let an agent (of node classi cation task) walk over the graph and decide where to direct to maximize classi cation accuracy. We de ne the graph walk as a partially observable Markov decision process (POMDP). e proposed method is exible for working in both transductive and inductive se ing. Extensive experiments on four datasets demonstrate that our proposed method outperforms several state-of-the-art methods. Several case studies also illustrate the meaningful movement trajectory made by the agent.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.