BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework

Zheng, Xiangwen; Du, Haizhou; Luo, Xiaowei; Tong, Fan; Song, Wei; Zhao, Dongsheng

doi:10.1186/s12859-022-05051-9

Cited by 11 publications

(11 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Following previous works 6 , 19 , 43 – 45 , we train the final models by merging the training and development sets and using a 10% split of this merged set for validation, while the provided testing file was used for evaluation. Table 1 provides the number of sentences in each set.…”

Section: Methodsmentioning

confidence: 99%

“…Furthermore, several studies 44 – 46 combined biomedical BERT with various ML and DL strategies and achieved state-of-the-art (SOTA) performances. For instance, BioByGANS 45 used BioBERT with graph neural networks and solved BioNER as a node classification problem. Wang and Gu 47 developed a Biaffine Layer on top of BERT-BILSTM, serving as a bidirectional mapping network for improved entity extraction and semantic information capture.…”

Section: Related Workmentioning

confidence: 99%

“…Thus, syntactic information can provide helpful information by analyzing the grammatical structure of sentences, which helps understand the relationship between the words and recognizing entities. In BioNER, several studies have used syntactic information to improve performance 8 , 45 , 50 , 51 . We improved this area by combining contextual and syntactic features using multi-feature embeddings.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

BioBBC: a multi-feature model that enhances the detection of biomedical entities

Alamro,

Gojobori,

Essack

et al. 2024

Sci Rep

View full text Add to dashboard Cite

The rapid increase in biomedical publications necessitates efficient systems to automatically handle Biomedical Named Entity Recognition (BioNER) tasks in unstructured text. However, accurately detecting biomedical entities is quite challenging due to the complexity of their names and the frequent use of abbreviations. In this paper, we propose BioBBC, a deep learning (DL) model that utilizes multi-feature embeddings and is constructed based on the BERT-BiLSTM-CRF to address the BioNER task. BioBBC consists of three main layers; an embedding layer, a Long Short-Term Memory (Bi-LSTM) layer, and a Conditional Random Fields (CRF) layer. BioBBC takes sentences from the biomedical domain as input and identifies the biomedical entities mentioned within the text. The embedding layer generates enriched contextual representation vectors of the input by learning the text through four types of embeddings: part-of-speech tags (POS tags) embedding, char-level embedding, BERT embedding, and data-specific embedding. The BiLSTM layer produces additional syntactic and semantic feature representations. Finally, the CRF layer identifies the best possible tag sequence for the input sentence. Our model is well-constructed and well-optimized for detecting different types of biomedical entities. Based on experimental results, our model outperformed state-of-the-art (SOTA) models with significant improvements based on six benchmark BioNER datasets.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

BioBBC: a multi-feature model that enhances the detection of biomedical entities

Alamro,

Gojobori,

Essack

et al. 2024

Sci Rep

View full text Add to dashboard Cite

show abstract

“…Syntax determines the topology of a sentence, which can be modeled as a graph (or a tree) [18]. Compared with a sequence, a graph can better reflect the semantic relationship between words.…”

Section: B Topology Of Languagementioning

confidence: 99%

“…Considering some studies have applied graphical sentence model to the biomedical named entity recognition task and obtained the SOTA results by using pre-trained LMs and graph neural networks [18][19][20], it may make sense to transfer the above framework to the BioRE and other BioNLP tasks.…”

mentioning

confidence: 99%

BioEGRE: A Linguistic Topology Enhanced Method for Biomedical Relation Extraction based on BioELECTRA and Graph Pointer Neural Network

Zheng,

Wang,

Luo

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

Background: Automatic and accurate extraction of various biomedical relations from literature is a crucial subtask of bio-medical text mining. Currently, stacking various classification networks on pre-trained language models to perform fine-tuning is a common framework to end-to-end solve the biomedical relation extraction (BioRE) problem. However, the sequence-based pre-trained language models underutilize the graphical topology of language to some extent. In addition, sequence-oriented deep neural networks have limitations in processing graphical features. Results: In this paper, we propose a novel method for sentence-level BioRE task, BioEGRE (BioELECTRA & Graph pointer neural net-work for Relation Extraction), which can capitalize the topological features of language. First, biomedical literature is preprocessed, which preserves sentences containing pre-fetched entity pair. Second, SciSpaCy is used to perform dependency parsing; sentences are modeled as graphs based on the parsing results; BioELECTRA is used to generate token-level representation, which is modeled as the attribute of nodes in sentence graphs; a graph pointer neural network layer is utilized to select the most relevant multi-hop neighbors to optimize the representation; a full-connected neural network layer is used to generate the sentence-level representation. Finally, a Softmax function is utilized to calculate probabilities. Our method is evaluated on a multi-type (CHEMPROT) and 2 binary (GAD and EU-ADR) BioRE tasks respectively, and achieves 79.97% (CHEMPROT), 83.31% (GAD) and 83.51% (EU-ADR) of F1-score, which outperforms existing state-of-the-art models. Conclusion: The experimental results on 3 biomedical benchmark datasets demonstrate the effectiveness and generalization of BioEGRE, which indicates that linguistic topology and a graph pointer neural network layer explicitly improve performance for BioRE tasks.

show abstract

NG_MDERANK: A software vulnerability feature knowledge extraction method based on N‐gram similarity

Wu,

Weng,

Zheng

et al. 2024

J Software Evolu Process

View full text Add to dashboard Cite

As software grows in size and complexity, software vulnerabilities are increasing, leading to a range of serious insecurity issues. Open‐source software vulnerability reports and documentation can provide researchers with great convenience for analysis and detection. However, the quality of different data sources varies, the data are duplicated and lack of correlation, which often requires a lot of manual management and analysis. In order to solve the problems of scattered and heterogeneous data and lack of correlation in traditional vulnerability repositories, this paper proposes a software vulnerability feature knowledge extraction method that combines the N‐gram model and mask similarity. The method generates mask text data based on the extraction of N‐gram candidate keywords and extracts vulnerability feature knowledge by calculating the similarity of mask text. This method analyzes the samples efficiently and stably in the environment of large sample size and complex samples and can obtain high‐value semi‐structured data. Then, the final node, relationship, and attribute information are obtained by secondary knowledge cleaning and extraction of the extracted semi‐structured data results. And based on the extraction results, the corresponding software vulnerability domain knowledge graph is constructed to deeply explore the semantic information features and entity relationships of vulnerabilities, which can help to efficiently study software security problems and solve vulnerability problems. The effectiveness and superiority of the proposed method is verified by comparing it with several traditional keyword extraction algorithms on Common Weakness Enumeration (CWE) and Common Vulnerabilities and Exposures (CVE) vulnerability data.

show abstract

BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework

Cited by 11 publications

References 50 publications

BioBBC: a multi-feature model that enhances the detection of biomedical entities

BioBBC: a multi-feature model that enhances the detection of biomedical entities

BioEGRE: A Linguistic Topology Enhanced Method for Biomedical Relation Extraction based on BioELECTRA and Graph Pointer Neural Network

NG_MDERANK: A software vulnerability feature knowledge extraction method based on N‐gram similarity

Contact Info

Product

Resources

About