Analyses of large somatic mutation datasets, using advanced computational algorithms, have revealed at least 30 independent mutational signatures in tumor samples. These studies have been instrumental in identification and quantification of responsible endogenous and exogenous molecular processes in cancer. The quantitative approach used to deconvolute mutational signatures is becoming an integral part of cancer research. Therefore, development of a stand-alone tool with a userfriendly graphical interface for analysis of cancer mutational signatures is necessary. In this manuscript, we introduce CANCERSIGN as an open access bioinformatics tool that uses raw mutation data (BED files) as input, and identifies 3-mer and 5-mer mutational signatures. CANCERSIGN enables users to identify signatures within whole genome, whole exome or pooled samples. It can also identify signatures in specific regions of the genome (defined by user). Additionally, this tool enables users to perform clustering on tumor samples based on the raw mutation counts as well as using the proportion of mutational signatures in each sample. Using this tool, we analysed all the whole genome somatic mutation datasets profiled by the International Cancer Genome Consortium (ICGC) and identified a number of novel signatures. By examining signatures found in exonic and non-exonic regions of the genome using WGS and comparing this to signatures found in WES data we observe that WGS can identify additional non-exonic signatures that are enriched in the non-coding regions of the genome while the deeper sequencing of WES may help identify weak signatures that are otherwise missed in shallower WGS data.We tried to apply CANCERSIGN, SomaticSignatures and SigneR to the simulated dataset in an equal condition. However, due to its Bayesian framework, SigneR is not scalable to large mutational catalogues containing several hundreds of samples. Consequently, it is not feasible to test SigneR on the simulation dataset, and instead performed the comparison between CANCERSIGN and SomaticSignatures.The parameters of the analysis were set as follows. The range of values of N (number of signatures to decipher) was set from 2 to 12. The maximum number of bootstraps for each N was set to 100 for CANCERSIGN, and the number of replicates (nReplicates) for SomaticSignatures was set to 20. With these settings, the tools consumed approximately the same amount of time (~45 minutes) to decipher mutational signatures from our simulated dataset (using a typical computer with four 1.7 GHz CPU cores and 8GB memory). According to Figure 10, both tools have correctly found N = 6 as the optimal number of underlying mutational signatures (the knee point in the diagram of summary statistics of SomaticSignatures [6], and the point with a high reproducibility and the lowest reconstruction error in the evaluation diagram of CANCERSIGN). The obtained mutational signatures are shown in Figure 11. By a simple visual comparison, we can conclude that both tools have deciphered almost identical s...
Analysis of cancer mutational signatures have been instrumental in identification of responsible endogenous and exogenous molecular processes in cancer. The quantitative approach used to deconvolute mutational signatures is becoming an integral part of cancer research. Therefore, development of a stand-alone tool with a user-friendly interface for analysis of cancer mutational signatures is necessary. In this manuscript we introduce CANCERSIGN, which enables users to identify 3-mer and 5-mer mutational signatures within whole genome, whole exome or pooled samples. Additionally, this tool enables users to perform clustering on tumor samples based on the proportion of mutational signatures in each sample. Using CANCERSIGN, we analysed all the whole genome somatic mutation datasets profiled by the International Cancer Genome Consortium (ICGC) and identified a number of novel signatures. By examining signatures found in exonic and non-exonic regions of the genome using WGS and comparing this to signatures found in WES data we observe that WGS can identify additional non-exonic signatures that are enriched in the non-coding regions of the genome while the deeper sequencing of WES may help identify weak signatures that are otherwise missed in shallower WGS data. Aberrant somatic changes in DNA resulting from endogenous sources (e.g. APOBEC-induced mutagenesis and DNA repair defects) and exogenous factors (e.g. tobacco smoking and UV radiation) are the hallmark of cancer. These alternations in DNA may have different forms, ranging from gross chromosomal rearrangements to single base substitutions 1. The whole genome sequencing of tumor cells has shown that the number of mutations varies from less than one hundred per genome to hundreds of thousands depending on the cancer type and patient. Moreover, the type of mutation and sequence context of many cancer mutations are not random. For instance, C-toT mutation within the CG (a.k.a. CpG) dinucleotide is a prevalent mutation in cancer and as its abundance is proportional to the age of patient it is referred to as an "aging" signature 2. Many cancers also have a large number of C-toT and C-to-G mutations within TCA and TCT trinucleotides 3. These mutations are attributed to the aberrant changes in the level and activity of APOBEC enzymes. The mutational landscape of each cancer genome is thus a cumulative result of multiple mutational signatures, each caused by a unique process such as methylation, APOBEC mediated changes, etc. 1. Typically, signatures of mutational processes are determined by considering the trinucleotide context of single base substitutions. If all mutations are presented based on changes in the same DNA strand, there are 96 possible
Colorectal cancer (CRC) is one of the leading causes of cancer-related deaths in the world. It has been reported that ~10%-15% of individuals with colorectal cancer experience a causative mutation in the known susceptibility genes, highlighting the importance of identifying mutations for early detection in high risk individuals. Through extensive sequencing projects such as the International Cancer Genome Consortium (ICGC), a large number of somatic point mutations have been identified that can be used to identify cancer-associated genes, as well as the signature of mutational 2 processes defined by the tri-nucleotide sequence context (motif) of mutated sites.Mutation is the hallmark of cancer genome, and many studies have reported cancer subtyping based on the type of frequently mutated genes, or the proportion of mutational processes, however, none of these cancer subtyping methods consider these features simultaneously. This highlights the need for a better and more inclusive subtype classification approach to enable biomarker discovery and thus inform drug development for CRC. In this study, we developed a statistical pipeline based on a novel concept 'gene-motif', which merges mutated gene information with tri-nucleotide motif of mutated sites, to identify cancer subtypes, in this case CRCs. Our analysis identified for the first time, 3,131 gene-motif combinations that were significantly mutated in 536 ICGC colorectal cancer samples compared to other cancer types, identifying seven CRC subtypes with distinguishable phenotypes and biomarkers. Interestingly, we identified several genes that were mutated in multiple subtypes but with unique sequence contexts. Taken together, our results highlight the importance of considering both the mutation type and mutated genes in identification of cancer subtypes and cancer biomarkers.
Background Colorectal cancer (CRC) is one of the leading causes of cancer-related deaths worldwide. Recent studies have observed causative mutations in susceptible genes related to colorectal cancer in 10 to 15% of the patients. This highlights the importance of identifying mutations for early detection of this cancer for more effective treatments among high risk individuals. Mutation is considered as the key point in cancer research. Many studies have performed cancer subtyping based on the type of frequently mutated genes, or the proportion of mutational processes. However, to the best of our knowledge, combination of these features has never been used together for this task. This highlights the potential to introduce better and more inclusive subtype classification approaches using wider range of related features to enable biomarker discovery and thus inform drug development for CRC. Results In this study, we develop a new pipeline based on a novel concept called ‘gene-motif’, which merges mutated gene information with tri-nucleotide motif of mutated sites, for colorectal cancer subtype identification. We apply our pipeline to the International Cancer Genome Consortium (ICGC) CRC samples and identify, for the first time, 3131 gene-motif combinations that are significantly mutated in 536 ICGC colorectal cancer samples. Using these features, we identify seven CRC subtypes with distinguishable phenotypes and biomarkers, including unique cancer related signaling pathways, in which for most of them targeted treatment options are currently available. Interestingly, we also identify several genes that are mutated in multiple subtypes but with unique sequence contexts. Conclusion Our results highlight the importance of considering both the mutation type and mutated genes in identification of cancer subtypes and cancer biomarkers. The new CRC subtypes presented in this study demonstrates distinguished phenotypic properties which can be effectively used to develop new treatments. By knowing the genes and phenotypes associated with the subtypes, a personalized treatment plan can be developed that considers the specific phenotypes associated with their genomic lesion.
Non-coding RNAs (ncRNAs) form a large portion of the mammalian genome. However, their biological functions are poorly characterized in cancers. In this study, using a newly developed tool, SomaGene, we analyze de novo somatic point mutations from the International Cancer Genome Consortium (ICGC) whole-genome sequencing data of 1,855 breast cancer samples. We identify 1030 candidates of ncRNAs that are significantly and explicitly mutated in breast cancer samples. By integrating data from the ENCODE regulatory features and FANTOM5 expression atlas, we show that the candidate ncRNAs significantly enrich active chromatin histone marks (1.9 times), CTCF binding sites (2.45 times), DNase accessibility (1.76 times), HMM predicted enhancers (2.26 times) and eQTL polymorphisms (1.77 times). Importantly, we show that the 1030 ncRNAs contain a much higher level (3.64 times) of breast cancer-associated genome-wide association (GWAS) single nucleotide polymorphisms (SNPs) than genome-wide expectation. Such enrichment has not been seen with GWAS SNPs from other cancers. Using breast cell line related Hi-C data, we then show that 82% of our candidate ncRNAs (1.9 times) significantly interact with the promoter of protein-coding genes, including previously known cancer-associated genes, suggesting the critical role of candidate ncRNA genes in the activation of essential regulators of development and differentiation in breast cancer. We provide an extensive web-based resource (https://www.ihealthe.unsw.edu.au/research) to communicate our results with the research community. Our list of breast cancer-specific ncRNA genes has the potential to provide a better understanding of the underlying genetic causes of breast cancer. Lastly, the tool developed in this study can be used to analyze somatic mutations in all cancers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.