Identifying Prognostic Gene Signatures Using a Network-Based Approach Swetha Bose Nutakki The main objective of this study is to develop a novel network-based methodology to identify prognostic signatures of genes that can predict recurrence in cancer. Feature selection algorithms were used widely for the identification of gene signatures in genome-wide association studies. But most of them do not discover the causal relationships between the features and need to compromise between accuracy and complexity. The network-based techniques take the molecular interactions between pairs of genes in to account and are thus a more efficient means of finding gene signatures, and they are also better in terms of its classification accuracy without compromising over complexity. Nevertheless, the network-based techniques currently being used have a few limitations each. Correlation-based coexpression networks do not provide predictive structure or causal relations among the genes. Bayesian networks cannot model feedback loops. Boolean networks can model small scale molecular networks, but not at the genome-scale. Thus the prediction logic induced implication networks are chosen to generate genome-wide coexpression networks, as they integrate formal logic and statistics and also overcome the limitations of other network-based techniques. The first part of the study includes building of an implication network and identification of a set of genes that could form a prognostic signature. The data used consisted of 442 samples taken from 4 different sources. The data was split into training set UM/HLM (n=256) and two testing sets DFCI (n=82) and MSK (n=104). The training set was used for the generation of the implication network and eventually the identification of the prognostic signature. The test sets were used for validating the obtained signature. The implication networks were built by using the gene expression data associated with two disease states (metastasis or non-metastasis), defined by the period and status of post-operative survival. The gene interactions that differentiated the two disease states, the differential components, were identified. The major cancer hallmarks (E2F, EGF, EGFR, KRAS, MET, RB1, and TP53) were considered, and the genes that interacted with all the major hallmarks were identified from the differential components to form a 31-gene prognostic signature. A software package was created in R to automate this process which has C-code embedded into it. Next, the signature was fitted into a COX proportional hazard model and the nearest point to the perfect classification in the ROC curve was identified as the best scheme for patient stratification on the training set (log-rank p-value =1.97e-08), and two test sets DFCI (log-rank pvalue =2.13e-05) and MSK (log-rank p-value = 1.24e-04) in Kaplan-Meier analyses. Prognostic validation was carried out on the test sets using methods such as Concordance Probability Estimate (CPE) and Gene Set Enrichment Analysis (GSEA). The accuracy of this signature was evalua...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.