Whole-genome sequencing (WGS) permits comprehensive cancer genome analyses, revealing mutational signatures, imprints of DNA damage, and repair processes that have arisen in each patient’s cancer. We performed mutational signature analyses on 12,222 whole-genome–sequenced tumor-normal matched pairs from patients recruited via the UK National Health Service (NHS). We contrasted our results with two independent cancer WGS datasets—from the International Cancer Genome Consortium (ICGC) and the Hartwig Medical Foundation (HMF)—involving 18,640 whole-genome–sequenced cancers in total. Our analyses add 40 single and 18 double substitution signatures to the current mutational signature tally. We show for each organ that cancers have a limited number of common signatures and a long tail of rare signatures, and we provide a practical solution for applying this concept of common versus rare signatures to future analyses.
The identification of causal variants in sequencing studies remains a considerable challenge that can be partially addressed by new gene-specific knowledge. Here, we integrate measures of how essential a gene is to supporting life, as inferred from viability and phenotyping screens performed on knockout mice by the International Mouse Phenotyping Consortium and essentiality screens carried out on human cell lines. We propose a cross-species gene classification across the Full Spectrum of Intolerance to Loss-of-function (FUSIL) and demonstrate that genes in five mutually exclusive FUSIL categories have differing biological properties. Most notably, Mendelian disease genes, particularly those associated with developmental disorders, are highly overrepresented among genes non-essential for cell survival but required for organism development. After screening developmental disorder cases from three independent disease sequencing consortia, we identify potentially pathogenic variants in genes not previously associated with rare diseases. We therefore propose FUSIL as an efficient approach for disease gene discovery.
The mutational landscape is shaped by many processes. Genic regions are vulnerable to mutation but are preferentially protected by transcription-coupled repair1. In microorganisms, transcription has been demonstrated to be mutagenic2,3; however, the impact of transcription-associated mutagenesis remains to be established in higher eukaryotes4. Here we show that ID4—a cancer insertion–deletion (indel) mutation signature of unknown aetiology5 characterized by short (2 to 5 base pair) deletions —is due to a transcription-associated mutagenesis process. We demonstrate that defective ribonucleotide excision repair in mammals is associated with the ID4 signature, with mutations occurring at a TNT sequence motif, implicating topoisomerase 1 (TOP1) activity at sites of genome-embedded ribonucleotides as a mechanistic basis. Such TOP1-mediated deletions occur somatically in cancer, and the ID-TOP1 signature is also found in physiological settings, contributing to genic de novo indel mutations in the germline. Thus, although topoisomerases protect against genome instability by relieving topological stress6, their activity may also be an important source of mutations in the human genome.
Cerebellar Ataxia, Neuropathy and Vestibular Areflexia Syndrome (CANVAS) is an autosomal recessive neurodegenerative disease, usually caused by biallelic AAGGG repeat expansions in RFC1. In this study, we leveraged whole genome sequencing (WGS) data from nearly 10,000 individuals recruited within the Genomics England sequencing project to investigate the normal and pathogenic variation of the RFC1 repeat. We identified three novel repeat motifs, AGGGC (n=6 from 5 families), AAGGC (n=2 from 1 family), AGAGG (n=1), associated with CANVAS in the homozygous or compound heterozygous state with the common pathogenic AAGGG expansion. While AAAAG, AAAGGG and AAGAG expansions appear to be benign, here we show a pathogenic role for large AAAGG repeat configuration expansions (n=5). Long read sequencing was used to fully characterise the entire repeat sequence and revealed a pure AGGGC expansion in six patients, whereas the other patients presented complex motifs with AAGGG or AAAGG interruptions. All pathogenic motifs seem to have arisen from a common haplotype and are predicted to form highly stable G quadruplexes, which have been previously demonstrated to affect gene transcription in other conditions. The assessment of these novel configurations is warranted in CANVAS patients with negative or inconclusive genetic testing. Particular attention should be paid to carriers of compound AAGGG/AAAGG expansions, since the AAAGG motif when very large (>500 repeats) or in the presence of AAGGG interruptions. Accurate sizing and full sequencing of the satellite repeat with long read is recommended in clinically selected cases, in order to achieve an accurate molecular diagnosis and counsel patients and their families.
The value of genome-wide over targeted driver analyses for predicting clinical outcomes of cancer patients is debated. Here, we report the whole-genome sequencing of 485 chronic lymphocytic leukemia patients enrolled in clinical trials as part of the United Kingdom’s 100,000 Genomes Project. We identify an extended catalog of recurrent coding and noncoding genetic mutations that represents a source for future studies and provide the most complete high-resolution map of structural variants, copy number changes and global genome features including telomere length, mutational signatures and genomic complexity. We demonstrate the relationship of these features with clinical outcome and show that integration of 186 distinct recurrent genomic alterations defines five genomic subgroups that associate with response to therapy, refining conventional outcome prediction. While requiring independent validation, our findings highlight the potential of whole-genome sequencing to inform future risk stratification in chronic lymphocytic leukemia.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.