Techniques that map the entities and relations of the knowledge graph (KG) into a low-dimensional continuous space are called KG embedding or knowledge representation learning. However, most existing techniques learn the embeddings based on the facts in KG alone, suffering from the issues of imperfection and spareness of KG. Recently, the research on textual information in KG embedding has attracted much attention due to the rich semantic information supplied by the texts. Thus, in this paper, a survey of techniques for textual information based KG embedding is proposed. Firstly, we introduce the techniques for encoding the textual information to represent the entities and relations from perspectives of encoding models and scoring functions, respectively. Secondly, methods for incorporating the textual information in the existing embedding techniques are summarized. Thirdly, we discuss the training procedure of textual information based KG embedding techniques. Finally, applications of KG embedding with textual information in the specific tasks such as KG completion in zero-shot scenario, multilingual entity alignment, relation extraction and recommender system are explored. We hope that this survey will give insights to researchers into textual information based KG embedding. INDEX TERMS Knowledge graph embedding, textual information, text-based embedding, text-improved embedding, embedding-based applications.
Distant supervision for relation extraction (DSRE) automatically acquires large-scale annotated data by aligning the corpus with the knowledge base, which dramatically reduces the cost of manual annotation. However, this technique is plagued by noisy data, which seriously affects the model’s performance. In this paper, we introduce negative training to filter them out. Specifically, we train the model with the complementary label based on the idea that “the sentence does not express the target relation”. The trained model can discriminate the noisy data from the training set. In addition, we believe that additional entity attributes (such as description, alias, and types) can provide more information for sentence representation. On this basis, we propose a DSRE model with entity attributes via negative training called EANT. While filtering noisy sentences, EANT also relabels some false negative sentences and converts them into useful training data. Our experimental results on the widely used New York Times dataset show that EANT can significantly improve the relation extraction performance over the state-of-the-art baselines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.