Random survival forests

Ishwaran, Hemant; Kogalur, Udaya B.; Blackstone, Eugene H.; Lauer, Michael S.

doi:10.1214/08-aoas169

Cited by 2,152 publications

(2,209 citation statements)

References 35 publications

Supporting

Mentioning

2,188

Contrasting

Unclassified

Order By: Relevance

“…In anticipation of future studies we intend to perform further comparisons with existing methods [27,33] and further simulations to examine the impact of tuning parameters and prior assumptions on model performance. Our current approach to missing values is to perform imputation prior to modeling; however, we are considering adjusting our method to deal with missing values as these are common in realistic data analysis contexts.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Bayesian Weibull tree models for survival analysis of clinico-genomic data

Clarke

West

2008

Statistical Methodology

View full text Add to dashboard Cite

An important goal of research involving gene expression data for outcome prediction is to establish the ability of genomic data to define clinically relevant risk factors. Recent studies have demonstrated that microarray data can successfully cluster patients into low-and high-risk categories. However, the need exists for models which examine how genomic predictors interact with existing clinical factors and provide personalized outcome predictions. We have developed clinico-genomic tree models for survival outcomes which use recursive partitioning to subdivide the current data set into homogeneous subgroups of patients, each with a specific Weibull survival distribution. These trees can provide personalized predictive distributions of the probability of survival for individuals of interest. Our strategy is to fit multiple models; within each model we adopt a prior on the Weibull scale parameter and update this prior via Empirical Bayes whenever the sample is split at a given node. The decision to split is based on a Bayes factor criterion. The resulting trees are weighted according to their relative likelihood values and predictions are made by averaging over models. In a pilot study of survival in advanced stage ovarian cancer we demonstrate that clinical and genomic data are complementary sources of information relevant to survival, and we use the exploratory nature of the trees to identify potential genomic biomarkers worthy of further study.

show abstract

Section: Discussionmentioning

confidence: 99%

“…In the case of genomic data these combinations can then serve as a basis for further biological study. Recent additions to the survival tree modeling literature, including [26,27] and [33], reflect the importance of survival trees as an analytic technique for data sets with complex structure.…”

Section: Regression Treesmentioning

confidence: 99%

Bayesian Weibull tree models for survival analysis of clinico-genomic data

Clarke

West

2008

Statistical Methodology

View full text Add to dashboard Cite

show abstract

“…We describe below random survival forests, which performed among the best in simulations. 15 Several other machine learning methods are presented in Supplementary materials. The goal of these machine learning methods is to identify pathways containing SNPs that can predict the survival outcome of the population of interest.…”

Section: Methodsmentioning

confidence: 99%

“…One of the popular variants is random survival forests. 15 A random survival forests encompasses many binary trees, each of which is formed by a deterministic algorithm. First, a best binary split is chosen using a subset of SNPs within a pathway.…”

Section: Random Survival Forestsmentioning

confidence: 99%

See 1 more Smart Citation

Pathway-based identification of SNPs predictive of survival

2011

View full text Add to dashboard Cite

In recent years, several association analysis methods for case-control studies have been developed. However, as we turn towards the identification of single nucleotide polymorphisms (SNPs) for prognosis, there is a need to develop methods for the identification of SNPs in high dimensional data with survival outcomes. Traditional methods for the identification of SNPs have some drawbacks. First, the majority of the approaches for case-control studies are based on single SNPs. Second, SNPs that are identified without incorporating biological knowledge are more difficult to interpret. Random forests has been found to perform well in gene expression analysis with survival outcomes. In this paper we present the first pathway-based method to correlate SNP with survival outcomes using a machine learning algorithm. We illustrate the application of pathway-based analysis of SNPs predictive of survival with a data set of 192 multiple myeloma patients genotyped for 500 000 SNPs. We also present simulation studies that show that the random forests technique with log-rank score split criterion outperforms several other machine learning algorithms. Thus, pathway-based survival analysis using machine learning tools represents a promising approach for the identification of biologically meaningful SNPs associated with disease.

show abstract

Predictive Accuracy of Stroke Risk Prediction Models Across Black and White Race, Sex, and Age Groups

Hong

Pencina

Wojdyla

et al. 2023

JAMA

View full text Add to dashboard Cite

ImportanceStroke is the fifth-highest cause of death in the US and a leading cause of serious long-term disability with particularly high risk in Black individuals. Quality risk prediction algorithms, free of bias, are key for comprehensive prevention strategies.ObjectiveTo compare the performance of stroke-specific algorithms with pooled cohort equations developed for atherosclerotic cardiovascular disease for the prediction of new-onset stroke across different subgroups (race, sex, and age) and to determine the added value of novel machine learning techniques.Design, Setting, and ParticipantsRetrospective cohort study on combined and harmonized data from Black and White participants of the Framingham Offspring, Atherosclerosis Risk in Communities (ARIC), Multi-Ethnic Study for Atherosclerosis (MESA), and Reasons for Geographical and Racial Differences in Stroke (REGARDS) studies (1983-2019) conducted in the US. The 62 482 participants included at baseline were at least 45 years of age and free of stroke or transient ischemic attack.ExposuresPublished stroke-specific algorithms from Framingham and REGARDS (based on self-reported risk factors) as well as pooled cohort equations for atherosclerotic cardiovascular disease plus 2 newly developed machine learning algorithms.Main Outcomes and MeasuresModels were designed to estimate the 10-year risk of new-onset stroke (ischemic or hemorrhagic). Discrimination concordance index (C index) and calibration ratios of expected vs observed event rates were assessed at 10 years. Analyses were conducted by race, sex, and age groups.ResultsThe combined study sample included 62 482 participants (median age, 61 years, 54% women, and 29% Black individuals). Discrimination C indexes were not significantly different for the 2 stroke-specific models (Framingham stroke, 0.72; 95% CI, 0.72-073; REGARDS self-report, 0.73; 95% CI, 0.72-0.74) vs the pooled cohort equations (0.72; 95% CI, 0.71-0.73): differences 0.01 or less (P values &gt;.05) in the combined sample. Significant differences in discrimination were observed by race: the C indexes were 0.76 for all 3 models in White vs 0.69 in Black women (all P values &lt;.001) and between 0.71 and 0.72 in White men and between 0.64 and 0.66 in Black men (all P values ≤.001). When stratified by age, model discrimination was better for younger (&lt;60 years) vs older (≥60 years) adults for both Black and White individuals. The ratios of observed to expected 10-year stroke rates were closest to 1 for the REGARDS self-report model (1.05; 95% CI, 1.00-1.09) and indicated risk overestimation for Framingham stroke (0.86; 95% CI, 0.82-0.89) and pooled cohort equations (0.74; 95% CI, 0.71-0.77). Performance did not significantly improve when novel machine learning algorithms were applied.Conclusions and RelevanceIn this analysis of Black and White individuals without stroke or transient ischemic attack among 4 US cohorts, existing stroke–specific risk prediction models and novel machine learning techniques did not significantly improve discriminative accuracy for new-onset stroke compared with the pooled cohort equations, and the REGARDS self-report model had the best calibration. All algorithms exhibited worse discrimination in Black individuals than in White individuals, indicating the need to expand the pool of risk factors and improve modeling techniques to address observed racial disparities and improve model performance.

show abstract

Random survival forests

Cited by 2,152 publications

References 35 publications

Bayesian Weibull tree models for survival analysis of clinico-genomic data

Bayesian Weibull tree models for survival analysis of clinico-genomic data

Pathway-based identification of SNPs predictive of survival

Predictive Accuracy of Stroke Risk Prediction Models Across Black and White Race, Sex, and Age Groups

Contact Info

Product

Resources

About