Background: Automatic and accurate extraction of various biomedical relations from literature is a crucial subtask of bio-medical text mining. Currently, stacking various classification networks on pre-trained language models to perform fine-tuning is a common framework to end-to-end solve the biomedical relation extraction (BioRE) problem. However, the sequence-based pre-trained language models underutilize the graphical topology of language to some extent. In addition, sequence-oriented deep neural networks have limitations in processing graphical features.
Results: In this paper, we propose a novel method for sentence-level BioRE task, BioEGRE (BioELECTRA & Graph pointer neural net-work for Relation Extraction), which can capitalize the topological features of language. First, biomedical literature is preprocessed, which preserves sentences containing pre-fetched entity pair. Second, SciSpaCy is used to perform dependency parsing; sentences are modeled as graphs based on the parsing results; BioELECTRA is used to generate token-level representation, which is modeled as the attribute of nodes in sentence graphs; a graph pointer neural network layer is utilized to select the most relevant multi-hop neighbors to optimize the representation; a full-connected neural network layer is used to generate the sentence-level representation. Finally, a Softmax function is utilized to calculate probabilities. Our method is evaluated on a multi-type (CHEMPROT) and 2 binary (GAD and EU-ADR) BioRE tasks respectively, and achieves 79.97% (CHEMPROT), 83.31% (GAD) and 83.51% (EU-ADR) of F1-score, which outperforms existing state-of-the-art models.
Conclusion: The experimental results on 3 biomedical benchmark datasets demonstrate the effectiveness and generalization of BioEGRE, which indicates that linguistic topology and a graph pointer neural network layer explicitly improve performance for BioRE tasks.