Computers in chemistryComputers in chemistry V 0380 Active Learning with Support Vector Machines in the Drug Discovery Process. -(WARMUTH*, M. K.; LIAO, J.; RAETSCH, G.; MATHIESON, M.; PUTTA, S.; LEMMEN, C.; J. Chem. Inf. Comput. Sci. 43 (2003) 2, 667-673; Comp. Sci. Dep., Univ. Calif., Santa Cruz, CA 95064, USA; Eng.) -Lindner 22-232
Cancers develop through somatic mutagenesis, however germline genetic variation can markedly contribute to tumorigenesis via diverse mechanisms. We discovered and phased 88 million germline single nucleotide variants, short insertions/deletions, and large structural variants in whole genomes from 2,642 cancer patients, and employed this genomic resource to study genetic determinants of somatic mutagenesis across 39 cancer types. Our analyses implicate damaging germline variants in a variety of cancer predisposition and DNA damage response genes with specific somatic mutation patterns. Mutations in the MBD4 DNA glycosylase gene showed association with elevated C>T mutagenesis at CpG dinucleotides, a ubiquitous mutational process acting across tissues. Analysis of somatic structural variation exposed complex rearrangement patterns, involving cycles of templated insertions and tandem duplications, in BRCA1-deficient tumours. Genome-wide association analysis implicated common genetic variation at the APOBEC3 gene cluster with reduced basal levels of somatic mutagenesis attributable to APOBEC cytidine deaminases across cancer types. We further inferred over a hundred polymorphic L1/LINE elements with somatic retrotransposition activity in cancer. Our study highlights the major impact of rare and common germline variants on mutational landscapes in cancer.
MotivationDeep learning techniques have yielded tremendous progress in the field of computational biology over the last decade, however many of these techniques are opaque to the user. To provide interpretable results, methods have incorporated biological priors directly into the learning task; one such biological prior is pathway structure. While pathways represent most biological processes in the cell, the high level of correlation and hierarchical structure make it complicated to determine an appropriate computational representation.ResultsHere, we present pathway module Variational Autoencoder (pmVAE). Our method encodes pathway information by restricting the structure of our VAE to mirror gene-pathway memberships. Its architecture is composed of a set of subnetworks, which we refer to as pathway modules. The subnetworks learn interpretable latent representations by factorizing the latent space according to pathway gene sets. We directly address correlation between pathways by balancing a module-specific local loss and a global reconstruction loss. Furthermore, since many pathways are by nature hierarchical and therefore the product of multiple downstream signals, we model each pathway as a multidimensional vector. Due to their factorization over pathways, the representations allow for easy and interpretable analysis of multiple downstream effects, such as cell type and biological stimulus, within the contexts of each pathway. We compare pmVAE against two other state-of-the-art methods on two single-cell RNA-seq case-control data sets, demonstrating that our pathway representations are both more discriminative and consistent in detecting pathways targeted by a perturbation.Availability and implementationhttps://github.com/ratschlab/pmvae
Cancer is characterised by somatic genetic variation, but the effect of the majority of non-coding somatic variants and the interface with the germline genome are still unknown. We analysed the whole genome and RNA-Seq data from 1,188 human cancer patients as provided by the Pan-cancer Analysis of Whole Genomes (PCAWG) project to map cis expression quantitative trait loci of somatic and germline variation and to uncover the causes of allele-specific expression patterns in human cancers. The availability of the first large-scale dataset with both whole genome and gene expression data enabled us to uncover the effects of the non-coding variation on cancer. In addition to confirming known regulatory effects, we identified novel associations between somatic variation and expression dysregulation, in particular in distal regulatory elements. Finally, we uncovered links between somatic mutational signatures and gene expression changes, including TERT and LMO2 , and we explained the inherited risk factors in APOBEC-related mutational processes. This work represents the first large-scale assessment of the effects of both germline and somatic genetic variation on gene expression in cancer and creates a valuable resource cataloguing these effects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.