Proceedings of the Workshop on BioNLP 2007 Biological, Translational, and Clinical Language Processing - BioNLP '07 2007
DOI: 10.3115/1572392.1572400
|View full text |Cite
|
Sign up to set email alerts
|

Combining multiple evidence for gene symbol disambiguation

Abstract: Gene names and symbols are important biomedical entities, but are highly ambiguous. This ambiguity affects the performance of both information extraction and information retrieval systems in the biomedical domain. Existing knowledge sources contain different types of information about genes and could be used to disambiguate gene symbols. In this paper, we applied an information retrieval (IR) based method for human gene symbol disambiguation and studied different methods to combine various types of information… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
9
0

Year Published

2008
2008
2011
2011

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(10 citation statements)
references
References 27 publications
1
9
0
Order By: Relevance
“…Several existing works [9], [10], [11] perform local species disambiguation, assigning each gene mention in the document a mention-specific species tag, while Kappeler et al [7] focuses on global species disambiguation for the document as a whole. In Wang and Matthews [11], species detection is tackled with a maximum entropy machine learning model based on document context features, such as the words (or more specifically the nouns or adjectives) to the left or right of an entity mention, and species words and IDs identified in the document.…”
Section: Gene Normalizationmentioning
confidence: 99%
“…Several existing works [9], [10], [11] perform local species disambiguation, assigning each gene mention in the document a mention-specific species tag, while Kappeler et al [7] focuses on global species disambiguation for the document as a whole. In Wang and Matthews [11], species detection is tackled with a maximum entropy machine learning model based on document context features, such as the words (or more specifically the nouns or adjectives) to the left or right of an entity mention, and species words and IDs identified in the document.…”
Section: Gene Normalizationmentioning
confidence: 99%
“…We used two evaluation metrics in our study, namely precision and coverage. They are the standard measures in the document classification community and this allowed us to make a direct comparison between our results and those in [ 14 , 15 ]. As the goal of our first set of approaches was to construct a system with good precision and then extend its results to obtain full coverage, we decided to examine both measures and not apply their aggregation (like the F measure).…”
Section: Resultsmentioning
confidence: 99%
“…It utilises the synonyms of the target gene name which are present in the document of the test gene. In this study we present experimental results on the GSD datasets built by Xu et al [ 14 , 15 ]. In [ 14 ] Xu and his colleagues took the words of the abstracts, the MeSH codes provided along with the MedLine articles, the words of the texts and some computer tagged information (UMLS CUIs and biomedical entities) as features while in [ 15 ] they experimented with the use of combinations of these features.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations