2020
DOI: 10.21105/joss.01708
|View full text |Cite
|
Sign up to set email alerts
|

Adeft: Acromine-based Disambiguation of Entities from Text with applications to the biomedical literature

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3

Relationship

5
1

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 11 publications
0
8
0
Order By: Relevance
“…Next, “grounding mapping” was performed to correct systematic errors in named entity normalization, which often arise due to the ambiguity of biomedical naming conventions. INDRA integrates both a manually-curated mapping table to fix entities frequently mis-identified by reading systems (described in detail in (Bachman et al, 2018)), and a set of machine learned models to perform disambiguation based on text context (by integrating the Adeft (Steppi et al, 2020) and Gilda (Gyori et al, 2022) systems). “ER” is an example of a common but ambiguous entity: it can stand for endoplasmic reticulum, estrogen receptor, estradiol receptor, emergency room, and a variety of other entities and concepts depending on context.…”
Section: Resultsmentioning
confidence: 99%
“…Next, “grounding mapping” was performed to correct systematic errors in named entity normalization, which often arise due to the ambiguity of biomedical naming conventions. INDRA integrates both a manually-curated mapping table to fix entities frequently mis-identified by reading systems (described in detail in (Bachman et al, 2018)), and a set of machine learned models to perform disambiguation based on text context (by integrating the Adeft (Steppi et al, 2020) and Gilda (Gyori et al, 2022) systems). “ER” is an example of a common but ambiguous entity: it can stand for endoplasmic reticulum, estrogen receptor, estradiol receptor, emergency room, and a variety of other entities and concepts depending on context.…”
Section: Resultsmentioning
confidence: 99%
“…Gilda currently contains such disambiguation models for 1,008 ambiguous strings (e.g., "HK4", "p42"). Gilda also integrates 153 disambiguation models made available by the Adeft system [5]. Adeft models are classifiers that can choose between senses of the most commonly occurring acronyms in biology literature with multiple senses (e.g., "ER") given surrounding text context.…”
Section: Resultsmentioning
confidence: 99%
“…Gilda offers the feature to disambiguate between the different senses of an entity text through a set of trained logistic regression models, using context provided by the user. Gilda v0.6.1 provides models for disambiguating 1,008 different entity texts and also makes use of models for 172 entity texts provided by the Adeft package [5], allowing for disambiguation of a total of 1,180 ambiguous entity texts. Below we describe the process used to train Gilda's models and select them for inclusion.…”
Section: Disambiguation Modelsmentioning
confidence: 99%
“…SAPK4) often with dashes in different places and various misspellings. In principle, NLP systems should be able to overcome these inconsistencies via robust grounding algorithms, but we find that misspellings, errors in residue numbering, use of mouse names for human proteins and vice versa, and use of ambiguous acronyms remain a substantial barrier to assembly of systematic knowledge about kinases (Bachman et al, 2019; Steppi et al, 2020) and presumably other classes of proteins as well. Moreover, in the scientific literature, findings about kinases are often described at the level of protein families (e.g., MEK, AKT, ERK) or complexes (e.g., mTORC1, PI3K) rather than one or more of their specific protein members (Bachman et al, 2018).…”
Section: Discussionmentioning
confidence: 97%