Non-coding RNAs (ncRNAs) form a large portion of the mammalian genome. However, their biological functions are poorly characterized in cancers. In this study, using a newly developed tool, SomaGene, we analyze de novo somatic point mutations from the International Cancer Genome Consortium (ICGC) whole-genome sequencing data of 1,855 breast cancer samples. We identify 1030 candidates of ncRNAs that are significantly and explicitly mutated in breast cancer samples. By integrating data from the ENCODE regulatory features and FANTOM5 expression atlas, we show that the candidate ncRNAs significantly enrich active chromatin histone marks (1.9 times), CTCF binding sites (2.45 times), DNase accessibility (1.76 times), HMM predicted enhancers (2.26 times) and eQTL polymorphisms (1.77 times). Importantly, we show that the 1030 ncRNAs contain a much higher level (3.64 times) of breast cancer-associated genome-wide association (GWAS) single nucleotide polymorphisms (SNPs) than genome-wide expectation. Such enrichment has not been seen with GWAS SNPs from other cancers. Using breast cell line related Hi-C data, we then show that 82% of our candidate ncRNAs (1.9 times) significantly interact with the promoter of protein-coding genes, including previously known cancer-associated genes, suggesting the critical role of candidate ncRNA genes in the activation of essential regulators of development and differentiation in breast cancer. We provide an extensive web-based resource (https://www.ihealthe.unsw.edu.au/research) to communicate our results with the research community. Our list of breast cancer-specific ncRNA genes has the potential to provide a better understanding of the underlying genetic causes of breast cancer. Lastly, the tool developed in this study can be used to analyze somatic mutations in all cancers.
De novo somatic point mutations identified in breast cancer are predominantly non-coding and typically attributed to altered regulatory elements such as enhancers and promoters. However, while the non-coding RNAs (ncRNAs) form a large portion of the mammalian genome, their biological functions are mostly poorly characterized in cancers. In this study, using a newly developed tool, SomaGene, we reanalyze de novo somatic point mutations from the International Cancer Genome Consortium (ICGC) whole-genome sequencing data of 1,855 breast cancers. We identify 929 candidates of ncRNAs that are significantly and explicitly mutated in breast cancer samples. By integrating data from the ENCODE regulatory features and FANTOM5 expression atlas, we show that the candidate ncRNAs in breast cancer samples significantly enrich for active chromatin histone marks (1.9 times), CTCF binding sites (2.45 times), DNase accessibility (1.76 times), HMM predicted enhancers (2.26 times) and eQTL polymorphisms (1.77 times). Importantly, we show that the 929 ncRNAs contain a much higher level (3.64 times) of breast cancer-associated genome-wide association (GWAS) single nucleotide polymorphisms (SNPs) than genome-wide expectation. Such enrichment has not been seen with GWAS SNPs from other diseases. Using breast tissue related Hi-C data we then show that 87% of our candidate ncRNAs (1.9 times) significantly interact with the promoter of protein-coding genes, including previously known cancer-associated genes, suggesting the critical role for candidate ncRNA genes in activation of essential regulators of development and differentiation in breast cancer. We provide an extensive web-based resource (http://ncrna.ictic.sharif.edu), to communicate our results with the research community. Our list of breast cancer-specific ncRNA genes has the potential to provide a better understanding of the underlying genetic causes of breast cancer. Lastly, the tool developed in this study can be used in the analysis of somatic mutations in all cancers.
Previous studies demonstrate the critical importance of non-coding RNAs interfacing with chromatin-modifying machinery resulting in promoter-enhancer-based gene regulation and raise the possibility that many other enhancer-like RNAs may operate via similar mechanisms. Critically, more than 80% of the disease-linked variations identified in genome-wide studies are located in the non-coding regions of genomes, especially non-coding RNA, suggesting non-coding RNAs are relevant to disease. Thus, a critical path forward for understanding non-coding RNAs' role, especially long non-coding RNAs, is to understand the genomic regions' transcriptional regulation, especially non-coding regions. Here, we developed a user-friendly R package called SomaGene for studying and identifying enhancer-like non-coding RNAs with enriched somatic mutations in the cancer genome. SomaGene accepts different genomic variants (whole genome/exome somatic point mutations, structural variations, copy number variations) to identify those RNAs that significantly mutated in diseases (e.g., cancer). It then uses multiple publicly available genomics and epigenetics datasets including ENCODE epigenomics annotations, FANTOM5 tissue-specific expression profiles, disease-associated genome-wide association SNPs, and tissue-specific eQTL pairs to identify those RNAs with potentially enhancer function. SomaGene, as a powerful R package, can provide the opportunity to cancer scientists to study the roles of non-coding RNAs in different cancer genomes.
Non-coding RNAs (ncRNAs) form a large portion of the mammalian genome however, their biological functions are poorly characterized in cancers. In this study, using a newly developed tool, SomaGene, we analyze de novo somatic point mutations from the International Cancer Genome Consortium (ICGC) whole-genome sequencing data of 1,855 breast cancers. We identify 929 candidates of ncRNAs that are significantly and explicitly mutated in breast cancer samples. By integrating data from the ENCODE regulatory features and FANTOM5 expression atlas, we show that the candidate ncRNAs in breast cancer samples significantly enrich for active chromatin histone marks (1.9 times), CTCF binding sites (2.45 times), DNase accessibility (1.76 times), HMM predicted enhancers (2.26 times) and eQTL polymorphisms (1.77 times). Importantly, we show that the 929 ncRNAs contain a much higher level (3.64 times) of breast cancer-associated genome-wide association (GWAS) single nucleotide polymorphisms (SNPs) than genome-wide expectation. Such enrichment has not been seen with GWAS SNPs from other diseases. Using breast tissue related Hi-C data we then show that 82% of our candidate ncRNAs (1.9 times) significantly interact with the promoter of protein-coding genes, including previously known cancer-associated genes, suggesting the critical role for candidate ncRNA genes in activation of essential regulators of development and differentiation in breast cancer. We provide an extensive web-based resource (https://www.ihealthe.unsw.edu.au/research), to communicate our results with the research community. Our list of breast cancer-specific ncRNA genes has the potential to provide a better understanding of the underlying genetic causes of breast cancer. Lastly, the tool developed in this study can be used in the analysis of somatic mutations in all cancers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.