Background In recent years, research on cancer predisposition germline variants has emerged as a prominent field. The identity of somatic mutations is based on a reliable mapping of the patient germline variants. In addition, the statistics of germline variants frequencies in healthy individuals and cancer patients is the basis for seeking candidates for cancer predisposition genes. The Cancer Genome Atlas (TCGA) is one of the main sources of such data, providing a diverse collection of molecular data including deep sequencing for more than 30 types of cancer from > 10,000 patients. Methods Our hypothesis in this study is that whole exome sequences from blood samples of cancer patients are not expected to show systematic differences among cancer types. To test this hypothesis, we analyzed common and rare germline variants across six cancer types, covering 2241 samples from TCGA. In our analysis we accounted for inherent variables in the data including the different variant calling protocols, sequencing platforms, and ethnicity. Results We report on substantial batch effects in germline variants associated with cancer types. We attribute the effect to the specific sequencing centers that produced the data. Specifically, we measured 30% variability in the number of reported germline variants per sample across sequencing centers. The batch effect is further expressed in nucleotide composition and variant frequencies. Importantly, the batch effect causes substantial differences in germline variant distribution patterns across numerous genes, including prominent cancer predisposition genes such as BRCA1, RET, MAX, and KRAS. For most of known cancer predisposition genes, we found a distinct batch-dependent difference in germline variants. Conclusion TCGA germline data is exposed to strong batch effects with substantial variabilities among TCGA sequencing centers. We claim that those batch effects are consequential for numerous TCGA pan-cancer studies. In particular, these effects may compromise the reliability and the potency to detect new cancer predisposition genes. Furthermore, interpretation of pan-cancer analyses should be revisited in view of the source of the genomic data after accounting for the reported batch effects. Electronic supplementary material The online version of this article (10.1186/s12885-019-5994-5) contains supplementary material, which is available to authorized users.
The primary function of microRNAs (miRNAs) is to maintain cell homeostasis. In cancerous tissues miRNAs’ expression undergo drastic alterations. In this study, we use miRNA expression profiles from The Cancer Genome Atlas of 24 cancer types and 3 healthy tissues, collected from >8500 samples. We seek to classify the cancer's origin and tissue identification using the expression from 1046 reported miRNAs. Despite an apparent uniform appearance of miRNAs among cancerous samples, we recover indispensable information from lowly expressed miRNAs regarding the cancer/tissue types. Multiclass support vector machine classification yields an average recall of 58% in identifying the correct tissue and tumor types. Data discretization had led to substantial improvement, reaching an average recall of 91% (95% median). We propose a straightforward protocol as a crucial step in classifying tumors of unknown primary origin. Our counter-intuitive conclusion is that in almost all cancer types, highly expressing miRNAs mask the significant signal that lower expressed miRNAs provide.
Expanding the arsenal of prophylactic approaches against SARS-CoV-2 is of utmost importance, specifically those strategies that are resistant to antigenic drift in Spike. Here, we conducted a screen with over 16,000 RNAi triggers against the SARS-CoV-2 genome using a massively parallel assay to identify hyper-potent siRNAs. We selected 10 candidates for in vitro validation and found five siRNAs that exhibited hyper-potent activity with IC50<20pM and strong neutralisation in live virus experiments. We further enhanced the activity by combinatorial pairing of the siRNA candidates to develop siRNA cocktails and found that these cocktails are active against multiple types of variants of concern (VOC). We examined over 2,000 possible mutations to the siRNA target sites using saturation mutagenesis and identified broad protection against future variants. Finally, we demonstrated that intranasal administration of the siRNA cocktail effectively attenuates clinical signs and viral measures of disease in the Syrian hamster model. Our results pave the way to development of an additional layer of antiviral prophylaxis that is orthogonal to vaccines and monoclonal antibodies.
It is estimated that up to 10% of cancer incidents are attributed to inherited genetic alterations. Despite extensive research, there are still gaps in our understanding of genetic predisposition to cancer. It was theorized that ultra-rare variants partially account for the missing heritable component. We harness the UK BioBank dataset of ~ 500,000 individuals, 14% of which were diagnosed with cancer, to detect ultra-rare, possibly high-penetrance cancer predisposition variants. We report on 115 cancer-exclusive ultra-rare variations and nominate 26 variants with additional independent evidence as cancer predisposition variants. We conclude that population cohorts are valuable source for expanding the collection of novel cancer predisposition genes. Discovery of cancer predisposition genes (CPGs) has the potential to impact personalized diagnosis and advance genetic consulting. Genetic analysis of family members with high occurrences of cancer has led to the identification of variants that increase the risk of developing cancer 1. In addition to family-based studies, efforts to identify CPGs focus on pediatric patients where the contribution of environmental factors is expected to be small. Forty percent of pediatric cancer patients belong to families with a history of cancer 2. Tumorigenesis results from mis-regulation of one or more of the major cancer hallmarks 3. Therefore, it is anticipated that CPGs overlap with genes that are often mutated in cancerous tissues. Indeed, CPGs most prevalent in children (TP53, APC, BRCA2, NF1, PMS2, RB1 and RUNX1) 2 are known cancer driver genes that function as tumor suppressors, oncogenes or have a role in maintaining DNA stability 4. Many of the predisposed cancer genes are associated with pathways of DNA-repair and homologous recombination 5. The inherited defects in cells' ability to repair and cope with DNA damage are considered as major factors in predisposition to breast and colorectal cancers 6. Complementary approaches for seeking CPGs are large-scale genome/exome wide association studies (GWAS) which are conducted solely based on statistical considerations without prior knowledge on cancer promoting genes 7. Identifying CPGs from GWAS is a challenge for the following reasons: (1) limited contribution of genetic heritability in certain cancer types; (2) low effect size/risk associated with each individual variant; (3) low-penetrance in view of individual's background 8 , and (4) low statistical power. Large cohorts of breast cancer show that ~ 2% of cancer cases are associated with mutations in BRCA1 and BRCA2 which are also high-risk ovarian cancer susceptibility genes. Additionally, TP53 and PTEN are associated with early-onset and high-risk familial breast cancer. Mutations in ATM and HRAS1 mildly increase the risk for breast cancer but strongly increase the risk for other cancer types and a collection of DNA mismatch repair genes (MLH1, MSH2, MSH6, PMS2) are associated with high risk of developing cancer 9. A large cohort of Caucasian patients with pancreatic ca...
BackgroundIn recent years, research on cancer predisposition germline variants has emerged as a prominent field. The identity of somatic mutations is based on a reliable mapping of the patient germline variants. In addition, the statistics of germline variants frequencies in healthy individuals and cancer patients is the basis for seeking candidates for cancer predisposition genes. The Cancer Genome Atlas (TCGA) is one of the main sources of such data, providing a diverse collection of molecular data including deep sequencing for more than 30 types of cancer from >10,000 patients.MethodsOur hypothesis in this study is that whole exome sequences from healthy blood samples of cancer patients are not expected to show systematic differences among cancer types. To test this hypothesis, we analyzed common and rare germline variants across six cancer types, covering 2,241 samples from TCGA. In our analysis we accounted for inherent variables in the data including the different variant calling protocols, sequencing platforms, and ethnicity.ResultsWe report on substantial batch effects in germline variants associated with cancer types. We attribute the effect to the specific sequencing centers that produced the data. Specifically, we measured 30% variability in the number of reported germline variants per sample across sequencing centers. The batch effect is further expressed in nucleotide composition and variant frequencies. Importantly, the batch effect causes substantial differences in germline variant distribution patterns across numerous genes, including prominent cancer predisposition genes such as BRCA1, RET, MAX, and KRAS. For most of known cancer predisposition genes, we found a distinct batch-dependent difference in germline variants.ConclusionTCGA germline data is exposed to strong batch effects with substantial variabilities among TCGA sequencing centers. We claim that those batch effects are consequential for numerous TCGA pan-cancer studies. In particular, these effects may compromise the reliability and the potency to detect new cancer predisposition genes. Furthermore, interpretation of pan-cancer analyses should be revisited in view of the source of the genomic data after accounting for the reported batch effects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.