Heterogeneous information networks (HINs) are ubiquitous in real-world applications. Due to the heterogeneity in HINs, the typed edges may not fully align with each other. In order to capture the semantic subtlety, we propose the concept of aspects with each aspect being a unit representing one underlying semantic facet. Meanwhile, network embedding has emerged as a powerful method for learning network representation, where the learned embedding can be used as features in various downstream applications. Therefore, we are motivated to propose a novel embedding learning framework—ASPEM—to preserve the semantic information in HINs based on multiple aspects. Instead of preserving information of the network in one semantic space, ASPEM encapsulates information regarding each aspect individually. In order to select aspects for embedding purpose, we further devise a solution for ASPEM based on dataset-wide statistics. To corroborate the efficacy of ASPEM, we conducted experiments on two real-words datasets with two types of applications—classification and link prediction. Experiment results demonstrate that ASPEM can outperform baseline network embedding learning methods by considering multiple aspects, where the aspects can be selected from the given HIN in an unsupervised manner.
Relation extraction is a fundamental task in information extraction. Most existing methods have heavy reliance on annotations labeled by human experts, which are costly and time-consuming. To overcome this drawback, we propose a novel framework, REHESSION, to conduct relation extractor learning using annotations from heterogeneous information source, e.g., knowledge base and domain heuristics. These annotations, referred as heterogeneous supervision, often conflict with each other, which brings a new challenge to the original relation extraction task: how to infer the true label from noisy labels for a given instance. Identifying context information as the backbone of both relation extraction and true label discovery, we adopt embedding techniques to learn the distributed representations of context, which bridges all components with mutual enhancement in an iterative fashion. Extensive experimental results demonstrate the superiority of REHESSION over the state-of-the-art.
Information diffusion has been widely studied in networks, aiming to model the spread of information among objects when they are connected with each other. Most of the current research assumes the underlying network is homogeneous, i.e., objects are of the same type and they are connected by links with the same semantic meanings. However, in the real word, objects are connected via different types of relationships, forming multi-relational heterogeneous information networks.In this paper, we propose to model information diffusion in such multi-relational networks, by distinguishing the power in passing information around for different types of relationships. We propose two variations of the linear threshold model for multi-relational networks, by considering the aggregation of information at either the model level or the relation level. In addition, we use real diffusion action logs to learn the parameters in these models, which will benefit diffusion prediction in real networks. We apply our diffusion models in two real bibliographic information networks, DBLP network and APS network, and experimentally demonstrate the effectiveness of our models compared with single-relational diffusion models. Moreover, our models can determine the diffusion power of each relation type, which helps us understand the diffusion process better in the multi-relational bibliographic network scenario.
Linguistic sequence labeling is a general approach encompassing a variety of problems, such as part-of-speech tagging and named entity recognition. Recent advances in neural networks (NNs) make it possible to build reliable models without handcrafted features. However, in many cases, it is hard to obtain sufficient annotations to train these models. In this study, we develop a neural framework to extract knowledge from raw texts and empower the sequence labeling task. Besides word-level knowledge contained in pre-trained word embeddings, character-aware neural language models are incorporated to extract character-level knowledge. Transfer learning techniques are further adopted to mediate different components and guide the language model towards the key knowledge. Comparing to previous methods, these task-specific knowledge allows us to adopt a more concise model and conduct more efficient training. Different from most transfer learning methods, the proposed framework does not rely on any additional supervision. It extracts knowledge from self-contained order information of training sequences. Extensive experiments on benchmark datasets demonstrate the effectiveness of leveraging character-level knowledge and the efficiency of co-training. For example, on the CoNLL03 NER task, model training completes in about 6 hours on a single GPU, reaching F_1 score of 91.71+/-0.10 without using any extra annotations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.