2008
DOI: 10.1186/1471-2105-9-69
|View full text |Cite
|
Sign up to set email alerts
|

The strength of co-authorship in gene name disambiguation

Abstract: BackgroundA biomedical entity mention in articles and other free texts is often ambiguous. For example, 13% of the gene names (aliases) might refer to more than one gene. The task of Gene Symbol Disambiguation (GSD) – a special case of Word Sense Disambiguation (WSD) – is to assign a unique gene identifier for all identified gene name aliases in biology-related articles. Supervised and unsupervised machine learning WSD techniques have been applied in the biomedical field with promising results. We examine here… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2008
2008
2012
2012

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…These can then be used to train a classifier to distinguish the correct identifier from incorrect ones [ 58 ]. Knowledge of paper co-authorship has been found to be useful in identifier disambiguation,[ 59 ] based on the idea that an author uses gene names consistently across all of their publications or may work on a specific set of genes consistently.…”
Section: Entity Normalisationmentioning
confidence: 99%
“…These can then be used to train a classifier to distinguish the correct identifier from incorrect ones [ 58 ]. Knowledge of paper co-authorship has been found to be useful in identifier disambiguation,[ 59 ] based on the idea that an author uses gene names consistently across all of their publications or may work on a specific set of genes consistently.…”
Section: Entity Normalisationmentioning
confidence: 99%
“…Author disambiguation has been used for diverse applications such as building social networks, normalizing gene names and analyzing collaborations. Farkas (35) was successful in using authors' information for improving the accuracy of a baseline gene normalization system from 80% to 97%. Large scale social network analysis of disambiguated author information is useful for finding key scientific leaders who are "low publishers" in scientific journals (36).…”
Section: Limitationsmentioning
confidence: 99%
“…In the biomedical domain researchers have focused on supervised methods [8][9][10][11] and using established knowledge [12][13][14][15] to perform gene name normalization and resolve abbreviations. According to the recent BioCreAtIvE challenge, the former problem can be solved with up to 81% success rate [14] for human genes, which are challenging with 5.5 synonyms per name (therefore many genes are named identically).…”
Section: Algorithms For Word Sense Disambiguationmentioning
confidence: 99%
“…The above approaches use cosine similarity [12], SVM [10,11], Bayes, decision trees, induced rules [8], and background knowledge sources such as the Unified Medical Language System (UMLS) [16], Medical Subject Headings (MeSH) [17], and the Gene Ontology (GO) [18]. Two approaches use metadata, such as authors [15] and Journal Descriptor Indexing [13]. Most of the unsupervised approaches so far were evaluated outside the biomedical domain [19][20][21][22][23][24][25], with the exception of [26], who used relations between terms given by the UMLS for unsupervised WSD of medical documents and achieved 74% precision and 49% recall.…”
Section: Algorithms For Word Sense Disambiguationmentioning
confidence: 99%