Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes

Zhou, Huiwei; Ning, Shixian; Liu, Zhe; Lang, Chengkun; Liu, Zhuang; Lei, Bizun

doi:10.1186/s12859-020-3375-3

Cited by 21 publications

(13 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When used as a Python package, Gilda was able to ground over 20 thousand strings per second in this benchmark (Supplementary Table 4). Gilda achieved state of the art F 1 for proteins (.693 for human .616 for non-human vs. .445 from [7]), cellular components (.504 vs. .476 from [8]), and small molecules (.620 vs. .591 from [8]). It underperformed for species (.586 vs. an average .623 over several configurations from [9]) cells/cell lines (.595 vs. .740 from [8]) and tissues (.446 vs. .633 from [8]) likely due to gaps in the lexical resources in Gilda covering those entity types.…”

Section: Resultsmentioning

confidence: 99%

Gilda: biomedical entity text normalization with machine-learned disambiguation as a service

Gyori

Hoyt

Steppi

2021

Preprint

View full text Add to dashboard Cite

Gilda is a software tool and web service which implements a scored string matching algorithm for names and synonyms across entries in biomedical ontologies covering genes, proteins (and their families and complexes), small molecules, biological processes and diseases. Gilda integrates machine-learned disambiguation models to choose between ambiguous strings given relevant surrounding text as context, and supports species-prioritization in case of ambiguity. The Gilda web service is available at http://grounding.indra.bio with source code, documentation and tutorials are available via https://github.com/indralab/gilda.

show abstract

Section: Resultsmentioning

confidence: 99%

Gilda: biomedical entity text normalization with machine-learned disambiguation as a service

Gyori

Hoyt

Steppi

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Medical named entity recognition and normalization are two basic tasks for the medical text mining. The conventional pipeline frameworks contains the NER model and NEN one separately (Vázquez et al, 2008;Sahu and Anand, 2016;Zhou et al, 2020). NER models extract medical mentions in texts and then NEN models map these mentions to standard entity identifiers.…”

Section: Medical Named Entity Recognition and Normalizationmentioning

confidence: 99%

“…The distributed representations of texts, such as: Word2Vec (Mikolov et al, 2013) and GloVe (Pennington et al, 2014), are utilized to calculate the similarity distance between two texts. Some medical named entity normalization models are based on this method Zhou et al, 2020). Considering local texts are more important than global ones, some researchers utilized convolution neural networks (CNN) to extract local features and exploited interactive attention mechanism to match the semantic similarity of two texts (Yin et al, 2016;.…”

Section: Short Text Matchingmentioning

confidence: 99%

An End-to-End Progressive Multi-Task Learning Framework for Medical Named Entity Recognition and Normalization

Zhou¹,

Cai²,

Zhang³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Medical named entity recognition (NER) and normalization (NEN) are fundamental for constructing knowledge graphs and building QA systems. Existing implementations for medical NER and NEN are suffered from the error propagation between the two tasks. The mispredicted mentions from NER will directly influence the results of NEN. Therefore, the NER module is the bottleneck of the whole system. Besides, the learnable features for both tasks are beneficial to improving the model performance. To avoid the disadvantages of existing models and exploit the generalized representation across the two tasks, we design an end-to-end progressive multi-task learning model for jointly modeling medical NER and NEN in an effective way. There are three level tasks with progressive difficulty in the framework. The progressive tasks can reduce the error propagation with the incremental task settings which implies the lower level tasks gain the supervised signals other than errors from the higher level tasks to improve their performances. Besides, the context features are exploited to enrich the semantic information of entity mentions extracted by NER. The performance of NEN profits from the enhanced entity mention features. The standard entities from knowledge bases are introduced into the NER module for extracting corresponding entity mentions correctly. The empirical results on two publicly available medical literature datasets demonstrate the superiority of our method over nine typical methods.

show abstract

“…In the past 2 decades, a large amount of work has been done to address this problem in the biomedical domain [11][12][13][14][15][16][17]. All of this work is supported by the existence of diverse biomedical vocabularies and standards such as the Unified Medical Language System [18], together with the collection of a large amount of annotated biomedical data (eg, in the domain of drugs, diseases, and other treatments) from numerous biomedical NLP workshops [19][20][21][22][23][24][25][26].…”

Section: Introductionmentioning

confidence: 99%

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation

Stojanov¹,

Popovski²,

Cenikj³

et al. 2021

J Med Internet Res

View full text Add to dashboard Cite

Background Recently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work has been done in natural language processing and machine learning to enable biomedical information extraction. However, machine learning in food science domains remains inadequately resourced, which brings to attention the problem of developing methods for food information extraction. There are only few food semantic resources and few rule-based methods for food information extraction, which often depend on some external resources. However, an annotated corpus with food entities along with their normalization was published in 2019 by using several food semantic resources. Objective In this study, we investigated how the recently published bidirectional encoder representations from transformers (BERT) model, which provides state-of-the-art results in information extraction, can be fine-tuned for food information extraction. Methods We introduce FoodNER, which is a collection of corpus-based food named-entity recognition methods. It consists of 15 different models obtained by fine-tuning 3 pretrained BERT models on 5 groups of semantic resources: food versus nonfood entity, 2 subsets of Hansard food semantic tags, FoodOn semantic tags, and Systematized Nomenclature of Medicine Clinical Terms food semantic tags. Results All BERT models provided very promising results with 93.30% to 94.31% macro F1 scores in the task of distinguishing food versus nonfood entity, which represents the new state-of-the-art technology in food information extraction. Considering the tasks where semantic tags are predicted, all BERT models obtained very promising results once again, with their macro F1 scores ranging from 73.39% to 78.96%. Conclusions FoodNER can be used to extract and annotate food entities in 5 different tasks: food versus nonfood entities and distinguishing food entities on the level of food groups by using the closest Hansard semantic tags, the parent Hansard semantic tags, the FoodOn semantic tags, or the Systematized Nomenclature of Medicine Clinical Terms semantic tags.

show abstract

Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes

Cited by 21 publications

References 25 publications

Gilda: biomedical entity text normalization with machine-learned disambiguation as a service

Gilda: biomedical entity text normalization with machine-learned disambiguation as a service

An End-to-End Progressive Multi-Task Learning Framework for Medical Named Entity Recognition and Normalization

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation

Contact Info

Product

Resources

About