DNorm: disease name normalization with pairwise learning to rank

Leaman, Robert; Doğan, Rezarta Islamaj; Lu, Zhiyong

doi:10.1093/bioinformatics/btt474

Cited by 436 publications

(411 citation statements)

References 31 publications

Supporting

Mentioning

405

Contrasting

Unclassified

Order By: Relevance

“…We have shown that our approach can be easily applied to different domains by merely exchanging the underlying ontology and training data. On the task of recognizing and linking disease names, we show that our approach outperforms the state-of-the-art systems DNorm [12] and TaggerOne [11], as well as two lexicon-based baselines. On the task of recognizing and linking chemical names, our system achieves comparable performance to the state-of-the-art.…”

Section: Resultsmentioning

confidence: 89%

“…We apply the same model to both problems, only exchanging the underlying reference knowledge base. With an F 1 score of 85.9 in disease linking, we outperform the state-of-the-art systems DNorm [12] and TaggerOne [11]; in chemical compounds linking, our system achieves an F 1 score of 86.6, which is comparable to the stateof-the-art. Thus, J-Link provides high performance on both domains without major need of manual adaptation or system tuning.…”

Section: Introductionmentioning

confidence: 79%

“…We focus our discussion on the biomedical domain and disease/chemical recognition and linking, as this is our application scenario in this paper. The DNorm system [12] relies on a learning-to-rank approach in order to induce similarities between disease mentions and concept names directly from training data. However, the system does not include any information about coherence between different entities within the same text.…”

Section: Related Workmentioning

confidence: 99%

“…We compare our approach to the two state-of-the-art systems DNorm [12] and TaggerOne [11], as well as against two simple baselines (LMB and LMB + ). The latter baselines are based on non-overlapping longest matches, using the dictionary as described in Section 3.5.…”

Section: Baselinesmentioning

confidence: 99%

See 3 more Smart Citations

Joint Entity Recognition and Linking in Technical Domains Using Undirected Probabilistic Graphical Models

Horst

Hartung

Cimiano

2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. The problems of recognizing mentions of entities in texts and linking them to unique knowledge base identifiers have received considerable attention in recent years. In this paper we present a probabilistic system based on undirected graphical models that jointly addresses both the entity recognition and the linking task. Our framework considers the span of mentions of entities as well as the corresponding knowledge base identifier as random variables and models the joint assignment using a factorized distribution. We show that our approach can be easily applied to different technical domains by merely exchanging the underlying ontology. On the task of recognizing and linking disease names, we show that our approach outperforms the state-of-the-art systems DNorm and TaggerOne, as well as two strong lexicon-based baselines. On the task of recognizing and linking chemical names, our system achieves comparable performance to the state-of-the-art.

show abstract

Section: Resultsmentioning

confidence: 89%

Section: Introductionmentioning

confidence: 79%

Section: Related Workmentioning

confidence: 99%

Section: Baselinesmentioning

confidence: 99%

See 2 more Smart Citations

Joint Entity Recognition and Linking in Technical Domains Using Undirected Probabilistic Graphical Models

Horst

Hartung

Cimiano

2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…For example, in the results section of one article (PMCID: PMC3910500), it described the genomic landscape of glioblastoma using the wholeexome (WES), whole-genome sequencing (WGS), and RNA-Sequencing (RNA) (Brennan et al, 2013). To identify the TCGA cancer type and high-throughput platform concept from the free texts, we developed a named entity recognition method that is based on a biomedical text mining tool (Leaman, Islamaj, & Lu, 2013).…”

Section: Tcga Data Usage Analysismentioning

confidence: 99%

Identifying Scientific Project-generated Data Citation from Full-text Articles: An Investigation of TCGA Data Citation

Zheng

Kang

et al. 2016

Journal of Data and Information Science

View full text Add to dashboard Cite

Purpose:In the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library archive. It is challenging to identify the data usage that is mentioned in literature and associate it with its source. Here, we investigated the data usage of a government-funded cancer genomics project, The Cancer Genome Atlas (TCGA), via a full-text literature analysis.Design/methodology/approach: We focused on identifying articles using the TCGA dataset and constructing linkages between the articles and the specific TCGA dataset. First, we collected 5,372 TCGA-related articles from PubMed Central (PMC). Second, we constructed a benchmark set with 25 full-text articles that truly used the TCGA data in their studies, and we summarized the key features of the benchmark set. Third, the key features were applied to the remaining PMC full-text articles that were collected from PMC. Findings:The amount of publications that use TCGA data has increased significantly since 2011, although the TCGA project was launched in 2005. Additionally, we found that the critical areas of focus in the studies that use the TCGA data were glioblastoma multiforme, lung cancer, and breast cancer; meanwhile, data from the RNA-sequencing (RNA-seq) platform is the most preferable for use.

show abstract

Computational Biology Approaches to Support Biomarker Discovery and Development

Shin

Trepicchio

et al. 2020

Biomarkers in Drug Discovery and Development

View full text Add to dashboard Cite

DNorm: disease name normalization with pairwise learning to rank

Cited by 436 publications

References 31 publications

Joint Entity Recognition and Linking in Technical Domains Using Undirected Probabilistic Graphical Models

Joint Entity Recognition and Linking in Technical Domains Using Undirected Probabilistic Graphical Models

Identifying Scientific Project-generated Data Citation from Full-text Articles: An Investigation of TCGA Data Citation

Computational Biology Approaches to Support Biomarker Discovery and Development

Contact Info

Product

Resources

About