The selection of effective genes that accurately predict chemotherapy responses might improve cancer outcomes. We compare optimized gene signatures for cisplatin, carboplatin, and oxaliplatin responses in the same cell lines and validate each signature using data from patients with cancer. Supervised support vector machine learning is used to derive gene sets whose expression is related to the cell line GI50 values by backwards feature selection with cross-validation. Specific genes and functional pathways distinguishing sensitive from resistant cell lines are identified by contrasting signatures obtained at extreme and median GI50 thresholds. Ensembles of gene signatures at different thresholds are combined to reduce the dependence on specific GI50 values for predicting drug responses. The most accurate gene signatures for each platin are: cisplatin: BARD1, BCL2, BCL2L1, CDKN2C, FAAP24, FEN1, MAP3K1, MAPK13, MAPK3, NFKB1, NFKB2, SLC22A5, SLC31A2, TLR4, and TWIST1; carboplatin: AKT1, EIF3K, ERCC1, GNGT1, GSR, MTHFR, NEDD4L, NLRP1, NRAS, RAF1, SGK1, TIGD1, TP53, VEGFB, and VEGFC; and oxaliplatin: BRAF, FCGR2A, IGF1, MSH2, NAGK, NFE2L2, NQO1, PANK3, SLC47A1, SLCO1B1, and UGT1A1. Data from The Cancer Genome Atlas (TCGA) patients with bladder, ovarian, and colorectal cancer were used to test the cisplatin, carboplatin, and oxaliplatin signatures, resulting in 71.0%, 60.2%, and 54.5% accuracies in predicting disease recurrence and 59%, 61%, and 72% accuracies in predicting remission, respectively. One cisplatin signature predicted 100% of recurrence in non-smoking patients with bladder cancer (57% disease-free; N = 19), and 79% recurrence in smokers (62% disease-free; N = 35). This approach should be adaptable to other studies of chemotherapy responses, regardless of the drug or cancer types.
Numerous genetic factors that influence breast cancer risk are known. However, approximately two-thirds of the overall familial risk remain unexplained. To determine whether some of the missing heritability is due to rare variants conferring high to moderate risk, we tested for an association between the c.5791C>T nonsense mutation (p.Arg1931*; rs144567652) in exon 22 of FANCM gene and breast cancer. An analysis of genotyping data from 8635 familial breast cancer cases and 6625 controls from different countries yielded an association between the c.5791C>T mutation and breast cancer risk [odds ratio (OR) = 3.93 (95% confidence interval (CI) = 1.28-12.11; P = 0.017)]. Moreover, we performed two meta-analyses of studies from countries with carriers in both cases and controls and of all available data. These analyses showed breast cancer associations with OR = 3.67 (95% CI = 1.04-12.87; P = 0.043) and OR = 3.33 (95% CI = 1.09-13.62; P = 0.032), respectively. Based on information theory-based prediction, we established that the mutation caused an out-of-frame deletion of exon 22, due to the creation of a binding site for the pre-mRNA processing protein hnRNP A1. Furthermore, genetic complementation analyses showed that the mutation influenced the DNA repair activity of the FANCM protein. In summary, we provide evidence for the first time showing that the common p.Arg1931* loss-of-function variant in FANCM is a risk factor for familial breast cancer.
The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.
Gene signatures derived from transcriptomic data using machine Background: learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches.Gene Expression Omnibus (GEO) datasets of exposed human and Methods: murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets.The best human signatures we derived exhibit k-fold , and ) when validated over 85 samples. Some human ENO1 PPM1D signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures.Gene signatures for ionizing radiation exposure derived by Conclusions: 2018, 7:233 Last updated: 20 MAR 2019 Gene signatures for ionizing radiation exposure derived by Conclusions: machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.
Mutations that affect mRNA splicing often produce multiple mRNA isoforms, resulting in complex molecular phenotypes. Definition of an exon and its inclusion in mature mRNA relies on joint recognition of both acceptor and donor splice sites. This study predicts cryptic and exon-skipping isoforms in mRNA produced by splicing mutations from the combined information contents (R(i), which measures binding-site strength, in bits) and distribution of the splice sites defining these exons. The total information content of an exon (R(i),total) is the sum of the R(i) values of its acceptor and donor splice sites, adjusted for the self-information of the distance separating these sites, that is, the gap surprisal. Differences between total information contents of an exon (ΔR(i,total)) are predictive of the relative abundance of these exons in distinct processed mRNAs. Constraints on splice site and exon selection are used to eliminate nonconforming and poorly expressed isoforms. Molecular phenotypes are computed by the Automated Splice Site and Exon Definition Analysis (http://splice.uwo.ca) server. Predictions of splicing mutations were highly concordant (85.2%; n = 61) with published expression data. In silico exon definition analysis will contribute to streamlining assessment of abnormal and normal splice isoforms resulting from mutations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.