The AACR Project GENIE is an international data-sharing consortium focused on generating an evidence base for precision cancer medicine by integrating clinical-grade cancer genomic data with clinical outcome data for tens of thousands of cancer patients treated at multiple institutions worldwide. In conjunction with the first public data release from approximately 19,000 samples, we describe the goals, structure, and data standards of the consortium and report conclusions from high-level analysis of the initial phase of genomic data. We also provide examples of the clinical utility of GENIE data, such as an estimate of clinical actionability across multiple cancer types (>30%) and prediction of accrual rates to the NCI-MATCH trial that accurately reflect recently reported actual match rates. The GENIE database is expected to grow to >100,000 samples within 5 years and should serve as a powerful tool for precision cancer medicine. Significance The AACR Project GENIE aims to catalyze sharing of integrated genomic and clinical datasets across multiple institutions worldwide, and thereby enable precision cancer medicine research, including the identification of novel therapeutic targets, design of biomarker-driven clinical trials, and identification of genomic determinants of response to therapy.
The successes of targeted drugs with companion predictive biomarkers and the technological advances in gene sequencing have generated enthusiasm for evaluating personalized cancer medicine strategies using genomic profiling. We assessed the feasibility of incorporating real-time analysis of somatic mutations within exons of 19 genes into patient management. Blood, tumor biopsy and archived tumor samples were collected from 50 patients recruited from four cancer centers. Samples were analyzed using three technologies: targeted exon sequencing using Pacific Biosciences PacBio RS, multiplex somatic mutation genotyping using Sequenom MassARRAY and Sanger sequencing. An expert panel reviewed results prior to reporting to clinicians. A clinical laboratory verified actionable mutations. Fifty patients were recruited. Nineteen actionable mutations were identified in 16 (32%) patients. Across technologies, results were in agreement in 100% of biopsy specimens and 95% of archival specimens. Profiling results from paired archival/biopsy specimens were concordant in 30/34 (88%) patients. We demonstrated that the use of next generation sequencing for real-time genomic profiling in advanced cancer patients is feasible. Additionally, actionable mutations identified in this study were relatively stable between archival and biopsy samples, implying that cancer mutations that are good predictors of drug response may remain constant across clinical stages.International efforts to quantify and catalogue mutations, gene expression and epigenetic data for multiple forms of cancer, coupled with the successes of targeted agents in patients with molecularly defined tumors and improvements in genomic technology, have increased enthusiasm to adopt genomic profiling into clinical cancer practice. 1 As the numbers of clinically significant genetic variants have increased, clinical testing has evolved, moving from single mutations to multiplex hotspot evaluations in multiple cancer genes. Several pilot studies have demonstrated the feasibility and potential benefits of real-time multiplex hotspot evaluations in various cancer types. 2-8 However, as improvements in genomic technology overcome previous concerns of cost, complexity, time and tissue requirements, an increasing interest in adopting next generation sequencing (NGS) for genomic profiling in clinical cancer practice has developed. 6,9,10 Roychowdhury et al. recently reported the use of integrative sequencing in the clinic and demonstrated its potential to facilitate biomarker driven clinical trials. 11 However, it remains unclear whether the use of high-throughput, real-time NGS for genomic profiling is capable of generating results in a timeframe that allows for changes to patient management. Furthermore, the additional value of NGS over the multiplex hotspot genotyping approach is unclear, and
BackgroundIt is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools.ResultsHere we present CaPSID (Computational Pathogen Sequence IDentification), a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage.ConclusionsTo demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID’s predictions were successfully validated in vitro.
Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized.Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences.Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html.Contact: ivan.borozan@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.