2020
DOI: 10.1186/s12859-020-3375-3
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes

Abstract: Background: Automated biomedical named entity recognition and normalization serves as the basis for many downstream applications in information management. However, this task is challenging due to name variations and entity ambiguity. A biomedical entity may have multiple variants and a variant could denote several different entity identifiers. Results: To remedy the above issues, we present a novel knowledge-enhanced system for protein/gene named entity recognition (PNER) and normalization (PNEN). On one hand… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
12
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(13 citation statements)
references
References 25 publications
1
12
0
Order By: Relevance
“…When used as a Python package, Gilda was able to ground over 20 thousand strings per second in this benchmark (Supplementary Table 4). Gilda achieved state of the art F 1 for proteins (.693 for human .616 for non-human vs. .445 from [7]), cellular components (.504 vs. .476 from [8]), and small molecules (.620 vs. .591 from [8]). It underperformed for species (.586 vs. an average .623 over several configurations from [9]) cells/cell lines (.595 vs. .740 from [8]) and tissues (.446 vs. .633 from [8]) likely due to gaps in the lexical resources in Gilda covering those entity types.…”
Section: Resultsmentioning
confidence: 99%
“…When used as a Python package, Gilda was able to ground over 20 thousand strings per second in this benchmark (Supplementary Table 4). Gilda achieved state of the art F 1 for proteins (.693 for human .616 for non-human vs. .445 from [7]), cellular components (.504 vs. .476 from [8]), and small molecules (.620 vs. .591 from [8]). It underperformed for species (.586 vs. an average .623 over several configurations from [9]) cells/cell lines (.595 vs. .740 from [8]) and tissues (.446 vs. .633 from [8]) likely due to gaps in the lexical resources in Gilda covering those entity types.…”
Section: Resultsmentioning
confidence: 99%
“…Medical named entity recognition and normalization are two basic tasks for the medical text mining. The conventional pipeline frameworks contains the NER model and NEN one separately (Vázquez et al, 2008;Sahu and Anand, 2016;Zhou et al, 2020). NER models extract medical mentions in texts and then NEN models map these mentions to standard entity identifiers.…”
Section: Medical Named Entity Recognition and Normalizationmentioning
confidence: 99%
“…The distributed representations of texts, such as: Word2Vec (Mikolov et al, 2013) and GloVe (Pennington et al, 2014), are utilized to calculate the similarity distance between two texts. Some medical named entity normalization models are based on this method Zhou et al, 2020). Considering local texts are more important than global ones, some researchers utilized convolution neural networks (CNN) to extract local features and exploited interactive attention mechanism to match the semantic similarity of two texts (Yin et al, 2016;.…”
Section: Short Text Matchingmentioning
confidence: 99%
“…In the past 2 decades, a large amount of work has been done to address this problem in the biomedical domain [11][12][13][14][15][16][17]. All of this work is supported by the existence of diverse biomedical vocabularies and standards such as the Unified Medical Language System [18], together with the collection of a large amount of annotated biomedical data (eg, in the domain of drugs, diseases, and other treatments) from numerous biomedical NLP workshops [19][20][21][22][23][24][25][26].…”
Section: Introductionmentioning
confidence: 99%