With the increased availability of survival datasets, that comprise both molecular information (e.g., gene expression), and clinical information (e.g., patient survival), numerous genes are proposed as prognostic biomarkers. Despite efforts and money invested, very few of these biomarkers have been clinically validated and are used routinely. A high false discovery rate is assumed to be largely responsible for this, in particular as the number of tested genes is extremely high relative to the number of patients followed. Here, after describing the historical methodologies on which recent developments have often been based, this review describes studies that have been performed in the last few years. The concepts will be illustrated for a renal cancer dataset, and the corresponding scripts are provided (Supporting Information). These new developments belong to three main fields of applications. First, variable selection concerns various improvements to lasso penalization. Second, accurate definition of p‐values and control of the false discovery rate have also been the subject of many studies. Third, the incorporation of biological knowledge, often through the form of networks or pathways, can be used as an a priori and/or to reduce dimensionality. These new and promising developments deserve benchmarking by independent groups not involved in their development, with various independent datasets. Further work on the methodologies is also still required.
Background Prediction of patient survival from tumor molecular ‘-omics’ data is a key step toward personalized medicine. Cox models performed on RNA profiling datasets are popular for clinical outcome predictions. But these models are applied in the context of “high dimension”, as the number p of covariates (gene expressions) greatly exceeds the number n of patients and e of events. Thus, pre-screening together with penalization methods are widely used for dimensional reduction. Methods In the present paper, (i) we benchmark the performance of the lasso penalization and three variants (i.e., ridge, elastic net, adaptive elastic net) on 16 cancers from TCGA after pre-screening, (ii) we propose a bi-dimensional pre-screening procedure based on both gene variability and p-values from single variable Cox models to predict survival, and (iii) we compare our results with iterative sure independence screening (ISIS). Results First, we show that integration of mRNA-seq data with clinical data improves predictions over clinical data alone. Second, our bi-dimensional pre-screening procedure can only improve, in moderation, the C-index and/or the integrated Brier score, while excluding irrelevant genes for prediction. We demonstrate that the different penalization methods reached comparable prediction performances, with slight differences among datasets. Finally, we provide advice in the case of multi-omics data integration. Conclusions Tumor profiles convey more prognostic information than clinical variables such as stage for many cancer subtypes. Lasso and Ridge penalizations perform similarly than Elastic Net penalizations for Cox models in high-dimension. Pre-screening of the top 200 genes in term of single variable Cox model p-values is a practical way to reduce dimension, which may be particularly useful when integrating multi-omics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.