Biomedical relation extraction is an important research subject in Natural language processing (NLP). Deep learning technology has shown greater value in improving accuracy of relation extraction results recently. Existing methods mostly focus on extracting (1) specific relation from short texts (eg, drug-drug interaction and protein-protein interaction) and (2) unspecific relation from full text corpora. However, extracting unspecific relation from short text, which is more and more important in practical use, is rarely studied. In this paper, a new model called MAT-LSTM is proposed to extract unspecific relation from short text in biomedical literatures. Experiments on two Biocreative benchmark datasets and one BioNLP benchmark datasets were made to measure the validity of the proposed model MAT-LSTM, and better performance is achieved. The MAT-LSTM model is also applied practically in extracting unspecific relation contained in the PubMed literatures. The results extracted from PubMed by using the proposed model were verified by experts mostly, indicating the practical value of the MAT-LSTM model. KEYWORDS biomedical relation, deep learning, natural language processing, unspecific relation
INTRODUCTIONWith the increasing popularity of precision medicine, the extraction of semantic relation between entities from biomedical literatures has attracted widespread attention in the areas of information extraction and natural language processing. 1 Discovering relations between different biomedical factors (eg, genome, metabolome, and transcriptome) is the foundation to better serve precision medicine. 2 Existing methods to extract specific types of biomedical relations (such as drug-drug interactions 3 and gene-disease interaction 4 ) are mature, whereas a method to extract unspecific types of relation is needed to discover different types relations. In this paper, a model is proposed to extract such unspecific relations.Existing biomedical relation extraction methods, including co-occurrence-based methods, rule-based methods, and machine learning methods, can be classified into two categories: (1) extracting unspecific relations in the literature from the full text and (2) extracting one specific type of relation in the literatures from short text (eg, gene-disease interaction and drug-disease interaction), instead of the full text.(1) Extracting unspecific relations in the literature from the full text.The co-occurrence-based approach is the simplest and most direct one. This method shows two features: first, in the same sentence, the closer the distance between two entities, the greater the correlation; second, the more times the two entities often appear in the same sentence, the greater Concurrency Computat Pract Exper. 2020;32:e5005. wileyonlinelibrary.com/journal/cpe