PURPOSE Precision oncology depends on the availability of up-to-date, comprehensive, and accurate information about associations between genetic variants and therapeutic options. Recently, a number of knowledge bases (KBs) have been developed that gather such information on the basis of expert curation of the scientific literature. We performed a quantitative and qualitative comparison of Clinical Interpretations of Variants in Cancer, OncoKB, Cancer Gene Census, Database of Curated Mutations, CGI Biomarkers (the cancer genome interpreter biomarker database), Tumor Alterations Relevant for Genomics-Driven Therapy, and the Precision Medicine Knowledge Base. METHODS We downloaded each KB and restructured their content to describe variants, genes, drugs, and gene-drug associations in a common format. We normalized gene names to Entrez Gene IDs and drug names to ChEMBL and DrugBank IDs. For the analysis of clinically relevant gene-drug associations, we obtained lists of genes affected by genetic alterations and putative drug therapies for 113 patients with cancer whose cases were presented at the Molecular Tumor Board (MTB) of the Charité Comprehensive Cancer Center. RESULTS Our analysis revealed that the KBs are largely overlapping but also that each source harbors a notable amount of unique information. Although some KBs cover more genes, others contain more data about gene-drug associations. Retrospective comparisons with findings of the Charitè MTB at the gene level showed that use of multiple KBs may considerably improve retrieval results. The relative importance of a KB in terms of cancer genes was assessed in more detail by logistic regression, which revealed that all but one source had a notable impact on result quality. We confirmed these findings using a second data set obtained from an independent MTB. CONCLUSION To date, none of the existing publicly available KBs on gene-drug associations in precision oncology fully subsumes the others, but all of them exhibit specific strengths and weaknesses. Consideration of multiple KBs, therefore, is essential to obtain comprehensive results.
BackgroundThe decreasing cost of obtaining high-quality calls of genomic variants and the increasing availability of clinically relevant data on such variants are important drivers for personalized oncology. To allow rational genome-based decisions in diagnosis and treatment, clinicians need intuitive access to up-to-date and comprehensive variant information, encompassing, for instance, prevalence in populations and diseases, functional impact at the molecular level, associations to druggable targets, or results from clinical trials. In practice, collecting such comprehensive information on genomic variants is difficult since the underlying data is dispersed over a multitude of distributed, heterogeneous, sometimes conflicting, and quickly evolving data sources. To work efficiently, clinicians require powerful Variant Information Systems (VIS) which automatically collect and aggregate available evidences from such data sources without suppressing existing uncertainty.MethodsWe address the most important cornerstones of modeling a VIS: We take from emerging community standards regarding the necessary breadth of variant information and procedures for their clinical assessment, long standing experience in implementing biomedical databases and information systems, our own clinical record of diagnosis and treatment of cancer patients based on molecular profiles, and extensive literature review to derive a set of design principles along which we develop a relational data model for variant level data. In addition, we characterize a number of public variant data sources, and describe a data integration pipeline to integrate their data into a VIS.ResultsWe provide a number of contributions that are fundamental to the design and implementation of a comprehensive, operational VIS. In particular, we (a) present a relational data model to accurately reflect data extracted from public databases relevant for clinical variant interpretation, (b) introduce a fault tolerant and performant integration pipeline for public variant data sources, and (c) offer recommendations regarding a number of intricate challenges encountered when integrating variant data for clincal interpretation.ConclusionThe analysis of requirements for representation of variant level data in an operational data model, together with the implementation-ready relational data model presented here, and the instructional description of methods to acquire comprehensive information to fill it, are an important step towards variant information systems for genomic medicine.Electronic supplementary materialThe online version of this article (10.1186/s12911-018-0665-z) contains supplementary material, which is available to authorized users.
Background Diagnosis and treatment decisions in cancer increasingly depend on a detailed analysis of the mutational status of a patient’s genome. This analysis relies on previously published information regarding the association of variations to disease progression and possible interventions. Clinicians to a large degree use biomedical search engines to obtain such information; however, the vast majority of scientific publications focus on basic science and have no direct clinical impact. We develop the Variant-Information Search Tool (VIST), a search engine designed for the targeted search of clinically relevant publications given an oncological mutation profile. Results VIST indexes all PubMed abstracts and content from ClinicalTrials.gov. It applies advanced text mining to identify mentions of genes, variants and drugs and uses machine learning based scoring to judge the clinical relevance of indexed abstracts. Its functionality is available through a fast and intuitive web interface. We perform several evaluations, showing that VIST’s ranking is superior to that of PubMed or a pure vector space model with regard to the clinical relevance of a document’s content. Conclusion Different user groups search repositories of scientific publications with different intentions. This diversity is not adequately reflected in the standard search engines, often leading to poor performance in specialized settings. We develop a search engine for the specific case of finding documents that are clinically relevant in the course of cancer treatment. We believe that the architecture of our engine, heavily relying on machine learning algorithms, can also act as a blueprint for search engines in other, equally specific domains. VIST is freely available at https://vist.informatik.hu-berlin.de/ Electronic supplementary material The online version of this article (10.1186/s12859-019-2958-3) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.