The pan-cancer analysis of whole genomes The expansion of whole-genome sequencing studies from individual ICGC and TCGA working groups presented the opportunity to undertake a meta-analysis of genomic features across tumour types. To achieve this, the PCAWG Consortium was established. A Technical Working Group implemented the informatics analyses by aggregating the raw sequencing data from different working groups that studied individual tumour types, aligning the sequences to the human genome and delivering a set of high-quality somatic mutation calls for downstream analysis (Extended Data Fig. 1). Given the recent meta-analysis
Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable due to difficulties in accurately resolving their structure, copy number and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads allowing for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy-number differences. We estimate that 73–87 genes will be on average copy-number variable between two human genomes and find that these genic differences overwhelmingly correspond to segmental duplications (OR=135; p<2.2e-16). Our method can distinguish between different copies of highly identical genes, providing a more accurate census of gene content and insight into functional constraint without the limitations of array-based technology.
Cancer develops through a process of somatic evolution 1,2. Sequencing data from a single biopsy represent a snapshot of this process that can reveal the timing of specific genomic aberrations and the changing influence of mutational processes 3. Here, by whole-genome sequencing analysis of 2,658 cancers as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) 4 , we reconstruct the life history and evolution of mutational processes and driver mutation sequences of 38 types of cancer. Early oncogenesis is characterized by mutations in a constrained set of driver genes, and specific copy number gains, such as trisomy 7 in glioblastoma and isochromosome 17q in medulloblastoma. The mutational spectrum changes significantly throughout tumour evolution in 40% of samples. A nearly fourfold diversification of driver genes and increased genomic instability are features of later stages. Copy number alterations often occur in mitotic crises, and lead to simultaneous gains of chromosomal segments. Timing analyses suggest that driver mutations often precede diagnosis by many years, if not decades. Together, these results determine the evolutionary trajectories of cancer, and highlight opportunities for early cancer detection. Similar to the evolution in species, the approximately 10 14 cells in the human body are subject to the forces of mutation and selection 1. This process of somatic evolution begins in the zygote and only comes to rest at death, as cells are constantly exposed to mutagenic stresses, introducing 1-10 mutations per cell division 2. These mutagenic forces lead to a gradual accumulation of point mutations throughout life, observed in a range of healthy tissues 5-11 and cancers 12. Although these mutations are predominantly selectively neutral passenger mutations, some are proliferatively advantageous driver mutations 13. The types of mutation in cancer genomes are well studied, but little is known about the times when these lesions arise during somatic evolution and where the boundary between normal evolution and cancer progression should be drawn. Sequencing of bulk tumour samples enables partial reconstruction of the evolutionary history of individual tumours, based on the catalogue of somatic mutations they have accumulated 3,14,15. These inferences include timing of chromosomal gains during early somatic evolution 16 , phylogenetic analysis of late cancer evolution using matched primary and metastatic tumour samples from individual patients 17-20 , and temporal ordering of driver mutations across many samples 21,22 .
SummaryCancer develops through a process of somatic evolution. Here, we use whole-genome sequencing of 2,778 tumour samples from 2,658 donors to reconstruct the life history, evolution of mutational processes, and driver mutation sequences of 39 cancer types. The early phases of oncogenesis are driven by point mutations in a small set of driver genes, often including biallelic inactivation of tumour suppressors. Early oncogenesis is also characterised by specific copy number gains, such as trisomy 7 in glioblastoma or isochromosome 17q in medulloblastoma. By contrast, increased genomic instability, a nearly four-fold diversification of driver genes, and an acceleration of point mutation processes are features of later stages. Copy-number alterations often occur in mitotic crises leading to simultaneous gains of multiple chromosomal segments. Timing analysis suggests that driver mutations often precede diagnosis by many years, and in some cases decades, providing a window of opportunity for early cancer detection.
Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.