Model-free feature screening via distance correlation for ultrahigh dimensional survival data

Zhang, Jing; Liu, Yanyan; Cui, Hengjian

doi:10.1007/s00362-020-01210-3

Cited by 11 publications

(10 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…However, we have to acknowledge that the improvements remain modest. Other procedures than ISIS have been recently proposed, but are not yet implemented, nor the code provided with the original article [ 72 , 73 ]. A benchmark among all these procedures would be an interesting perspective.…”

Section: Discussionmentioning

confidence: 99%

Prognosis of lasso-like penalized Cox models with tumor profiling improves prediction over clinical data alone and benefits from bi-dimensional pre-screening

et al. 2022

View full text Add to dashboard Cite

Background Prediction of patient survival from tumor molecular ‘-omics’ data is a key step toward personalized medicine. Cox models performed on RNA profiling datasets are popular for clinical outcome predictions. But these models are applied in the context of “high dimension”, as the number p of covariates (gene expressions) greatly exceeds the number n of patients and e of events. Thus, pre-screening together with penalization methods are widely used for dimensional reduction. Methods In the present paper, (i) we benchmark the performance of the lasso penalization and three variants (i.e., ridge, elastic net, adaptive elastic net) on 16 cancers from TCGA after pre-screening, (ii) we propose a bi-dimensional pre-screening procedure based on both gene variability and p-values from single variable Cox models to predict survival, and (iii) we compare our results with iterative sure independence screening (ISIS). Results First, we show that integration of mRNA-seq data with clinical data improves predictions over clinical data alone. Second, our bi-dimensional pre-screening procedure can only improve, in moderation, the C-index and/or the integrated Brier score, while excluding irrelevant genes for prediction. We demonstrate that the different penalization methods reached comparable prediction performances, with slight differences among datasets. Finally, we provide advice in the case of multi-omics data integration. Conclusions Tumor profiles convey more prognostic information than clinical variables such as stage for many cancer subtypes. Lasso and Ridge penalizations perform similarly than Elastic Net penalizations for Cox models in high-dimension. Pre-screening of the top 200 genes in term of single variable Cox model p-values is a practical way to reduce dimension, which may be particularly useful when integrating multi-omics.

show abstract

Section: Discussionmentioning

confidence: 99%

Prognosis of lasso-like penalized Cox models with tumor profiling improves prediction over clinical data alone and benefits from bi-dimensional pre-screening

et al. 2022

View full text Add to dashboard Cite

show abstract

“…The results of Case (c) demonstrate that CDC-SIS procedure performs well no matter for the completely random censoring mechanism or the informative censoring mechanism. Moreover, the results of Case (d) show that the proposed procedure is robust to 𝑝, the number of covariates; 𝜎 1 = 1; 𝜎 2 = (𝑍 2 1 + 𝑍 2 2 + 𝑍 2 3 ) −1 ; FAST: the screening procedure of Gorst-Rasmussen & Scheike (2013); P-SIS: the screening procedure of Zhao & Li (2012); CRIS: the screening procedure of Song et al (2014); CCRIS: the screening procedure of Zhang et al (2018); C-SIRS: the screening procedure of Zhou & Zhu (2017); DCSIS: the screening procedure of Zhang et al (2021); COXCS: the conditional screening procedure of Hong et al (2018); CDC-SIS: the proposed CDC-based conditional screening procedure.…”

Section: Simulation Studiesmentioning

confidence: 99%

“…Gene AA805575, a Germinal-center B-cell signature gene has been known to be predictive to DCBCL patients' survival time in the literature (e.g., Gui & Li, 2005;Liu et al, 2013), we treat it as the conditional variable in our proposed procedure. In particular, we first apply the proposed CDC-SIS procedure to screen the important ones among the 7399 genes and select 𝑝, the number of covariates; 𝜎 1 = 1; 𝜎 2 = (𝑍 2 1 + 𝑍 2 2 + 𝑍 2 3 ) −1 ; FAST: the screening procedure of Gorst-Rasmussen & Scheike (2013); P-SIS: the screening procedure of Zhao & Li (2012); CRIS: the screening procedure of Song et al (2014); CCRIS: the screening procedure of Zhang et al (2018); C-SIRS: the screening procedure of Zhou & Zhu (2017); DCSIS: the screening procedure of Zhang et al (2021); COXCS: the conditional screening procedure of Hong et al (2018); CDC-SIS: the proposed CDC-based conditional screening procedure.…”

Section: A Real Examplementioning

confidence: 99%

See 1 more Smart Citation

Model‐free conditional screening for ultrahigh‐dimensional survival data via conditional distance correlation

et al. 2022

Self Cite

View full text Add to dashboard Cite

How to select the active variables that have significant impact on the event of interest is a very important and meaningful problem in the statistical analysis of ultrahigh‐dimensional data. In many applications, researchers often know that a certain set of covariates are active variables from some previous investigations and experiences. With the knowledge of the important prior knowledge of active variables, we propose a model‐free conditional screening procedure for ultrahigh dimensional survival data based on conditional distance correlation. The proposed procedure can effectively detect the hidden active variables that are jointly important but are weakly correlated with the response. Moreover, it performs well when covariates are strongly correlated with each other. We establish the sure screening property and the ranking consistency of the proposed method and conduct extensive simulation studies, which suggests that the proposed procedure works well for practical situations. Then, we illustrate the new approach through a real dataset from the diffuse large‐B‐cell lymphoma study S1.

show abstract

“…Many authors have considered the generalizations of the SIS procedure to the screening of important features based on right-censored failure time data. In general, the developed procedures can be classified into two types, model-based ones (Tibshirani, 2009;Fan, Feng and Wu, 2010;Zhao and Li, 2012;Gorst-Rasmussen and Scheike, 2013) and modelfree methods (Song et al, 2014;Wu and Yin, 2015;Zhang, Liu and Wu, 2017;Zhou and Zhu, 2017;Liu, Zhang and Zhao, 2018;Zhang et al, 2018;Lin, Liu and Hao, 2018;Zhang, Liu and Cui, 2020). However, limited SIS methods have been studied for IC failure time data.…”

Section: Introductionmentioning

confidence: 99%

A New Model-Free Feature Screening Procedure for Ultrahigh-Dimensional Interval-Censored Failure Time Data

Zhang¹,

Du²,

Liu³

et al. 2023

STAT SINICA

Self Cite

View full text Add to dashboard Cite

Screening important features based on ultrahigh-dimensional data has become one of the important tasks in statistical analysis, and correspondingly, various screening procedures have been proposed for various types of studies or data including complete data and right-censored failure time data. In this paper, we consider ultrahigh-dimensional interval-censored failure time data, which frequently occur in medical follow-up studies among others and include rightcensored data as a special case but for which only limited work exists. For the problem, a distance correlation-based sure independent screening procedure is proposed, and the new approach is model-free and does not require the estimation of survival functions unlike most of the existing nonparametric screening procedures for failure time data. We establish the sure screening property and the ranking consistency of the proposed method and conduct an extensive simu-

show abstract

Model-free feature screening via distance correlation for ultrahigh dimensional survival data

Cited by 11 publications

References 37 publications

Prognosis of lasso-like penalized Cox models with tumor profiling improves prediction over clinical data alone and benefits from bi-dimensional pre-screening

Prognosis of lasso-like penalized Cox models with tumor profiling improves prediction over clinical data alone and benefits from bi-dimensional pre-screening

Model‐free conditional screening for ultrahigh‐dimensional survival data via conditional distance correlation

A New Model-Free Feature Screening Procedure for Ultrahigh-Dimensional Interval-Censored Failure Time Data

Contact Info

Product

Resources

About