Background: Identification of gene and protein names in biomedical text is a challenging task as the corresponding nomenclature has evolved over time. This has led to multiple synonyms for individual genes and proteins, as well as names that may be ambiguous with other gene names or with general English words. The Gene List Task of the BioCreAtIvE challenge evaluation enables comparison of systems addressing the problem of protein and gene name identification on common benchmark data.
BackgroundIn order to retrieve useful information from scientific literature and electronic medical records (EMR) we developed an ontology specific for Multiple Sclerosis (MS).MethodsThe MS Ontology was created using scientific literature and expert review under the Protégé OWL environment. We developed a dictionary with semantic synonyms and translations to different languages for mining EMR. The MS Ontology was integrated with other ontologies and dictionaries (diseases/comorbidities, gene/protein, pathways, drug) into the text-mining tool SCAIView. We analyzed the EMRs from 624 patients with MS using the MS ontology dictionary in order to identify drug usage and comorbidities in MS. Testing competency questions and functional evaluation using F statistics further validated the usefulness of MS ontology.ResultsValidation of the lexicalized ontology by means of named entity recognition-based methods showed an adequate performance (F score = 0.73). The MS Ontology retrieved 80% of the genes associated with MS from scientific abstracts and identified additional pathways targeted by approved disease-modifying drugs (e.g. apoptosis pathways associated with mitoxantrone, rituximab and fingolimod). The analysis of the EMR from patients with MS identified current usage of disease modifying drugs and symptomatic therapy as well as comorbidities, which are in agreement with recent reports.ConclusionThe MS Ontology provides a semantic framework that is able to automatically extract information from both scientific literature and EMR from patients with MS, revealing new pathogenesis insights as well as new clinical information.
The influence of genetic variations on diseases or cellular processes is the main focus of many investigations, and results of biomedical studies are often only accessible through scientific publications. Automatic extraction of this information requires recognition of the gene names and the accompanying allelic variant information. In a previous work, the OSIRIS system for the detection of allelic variation in text based on a query expansion approach was communicated. Challenges associated with this system are the relatively low recall for variation mentions and gene name recognition. To tackle this challenge, we integrate the ProMiner system developed for the recognition and normalization of gene and protein names with a conditional random field (CRF)-based recognition of variation terms in biomedical text. Following the newly developed normalization of variation entities, we can link textual entities to Single Nucleotide Polymorphism database (dbSNP) entries. The performance of this novel approach is evaluated, and improved results in comparison to state-of-the-art systems are reported.
Molecular signaling pathways have been long used to demonstrate interactions among upstream causal molecules and downstream biological effects. They show the signal flow between cell compartments, the majority of which are represented as cartoons. These are often drawn manually by scanning through the literature, which is time-consuming, static, and non-interoperable. Moreover, these pathways are often devoid of context (condition and tissue) and biased toward certain disease conditions. Mining the scientific literature creates new possibilities to retrieve pathway information at higher contextual resolution and specificity. To address this challenge, we have created a pathway terminology system by combining signaling pathways and biological events to ensure a broad coverage of the entire pathway knowledge domain. This terminology was applied to mining biomedical papers and patents about neurodegenerative diseases with focus on Alzheimer's disease. We demonstrate the power of our approach by mapping literature-derived signaling pathways onto their corresponding anatomical regions in the human brain under healthy and Alzheimer's disease states. We demonstrate how this knowledge resource can be used to identify a putative mechanism explaining the mode-of-action of the approved drug Rasagiline, and show how this resource can be used for fingerprinting patents to support the discovery of pathway knowledge for Alzheimer's disease. Finally, we propose that based on next-generation cause-and-effect pathway models, a dedicated inventory of computer-processable pathway models specific to neurodegenerative diseases can be established, which hopefully accelerates context-specific enrichment analysis of experimental data with higher resolution and richer annotations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.