BackgroundRandom survival forest (RSF) models have been identified as alternative methods to the Cox proportional hazards model in analysing time-to-event data. These methods, however, have been criticised for the bias that results from favouring covariates with many split-points and hence conditional inference forests for time-to-event data have been suggested. Conditional inference forests (CIF) are known to correct the bias in RSF models by separating the procedure for the best covariate to split on from that of the best split point search for the selected covariate.MethodsIn this study, we compare the random survival forest model to the conditional inference model (CIF) using twenty-two simulated time-to-event datasets. We also analysed two real time-to-event datasets. The first dataset is based on the survival of children under-five years of age in Uganda and it consists of categorical covariates with most of them having more than two levels (many split-points). The second dataset is based on the survival of patients with extremely drug resistant tuberculosis (XDR TB) which consists of mainly categorical covariates with two levels (few split-points).ResultsThe study findings indicate that the conditional inference forest model is superior to random survival forest models in analysing time-to-event data that consists of covariates with many split-points based on the values of the bootstrap cross-validated estimates for integrated Brier scores. However, conditional inference forests perform comparably similar to random survival forests models in analysing time-to-event data consisting of covariates with fewer split-points.ConclusionAlthough survival forests are promising methods in analysing time-to-event data, it is important to identify the best forest model for analysis based on the nature of covariates of the dataset in question.
BackgroundInfant and child mortality rates are among the health indicators of importance in a given community or country. It is the fourth millennium development goal that by 2015, all the United Nations member countries are expected to have reduced their infant and child mortality rates by two-thirds.Uganda is one of those countries in Sub-Saharan Africa with high infant and child mortality rates, therefore it is important to use sound statistical methods to determine which factors are strongly associated with child mortality which in turn will help inform the design of intervention strategiesMethodsThe Uganda Demographic Health Survey (UDHS) funded by USAID, UNFPA, UNICEF, Irish Aid and the United Kingdom government provides a data set which is rich in information on child mortality or survival. Survival analysis techniques are among the well-developed methods in Statistics for analysing time to event data. These methods were adopted in this paper to examine factors affecting under-five child mortality rates (UMR) in Uganda using the UDHS data for 2011 in R and STATA software.ResultsResults obtained by fitting the Cox-proportional hazard model with frailty effects and drawing inference using both the frequentists and Bayesian approaches at 5 % significance level, show evidence of the existence of unobserved heterogeneity at the household level but there was not enough evidence to conclude the existence of unobserved heterogeneity at the community level. Sex of the household head, sex of the child and number of births in the past one year were found to be significant. The results further suggest that over the period of 1990–2015, Uganda reduced its UMR by 52 % .ConclusionUganda has not achieved the MDG4 target but the 52 % reduction in the UMR is a move in the positive direction. Demographic factors (sex of the household head) and Biological determinants (sex of the child and number of births in the past one year) are strongly associated with high UMR. Heterogeneity or unobserved covariates were found to be significant at the household but insignificant at the community level.
BackgroundUganda just like any other Sub-Saharan African country, has a high under-five child mortality rate. To inform policy on intervention strategies, sound statistical methods are required to critically identify factors strongly associated with under-five child mortality rates. The Cox proportional hazards model has been a common choice in analysing data to understand factors strongly associated with high child mortality rates taking age as the time-to-event variable. However, due to its restrictive proportional hazards (PH) assumption, some covariates of interest which do not satisfy the assumption are often excluded in the analysis to avoid mis-specifying the model. Otherwise using covariates that clearly violate the assumption would mean invalid results.MethodsSurvival trees and random survival forests are increasingly becoming popular in analysing survival data particularly in the case of large survey data and could be attractive alternatives to models with the restrictive PH assumption. In this article, we adopt random survival forests which have never been used in understanding factors affecting under-five child mortality rates in Uganda using Demographic and Health Survey data. Thus the first part of the analysis is based on the use of the classical Cox PH model and the second part of the analysis is based on the use of random survival forests in the presence of covariates that do not necessarily satisfy the PH assumption.ResultsRandom survival forests and the Cox proportional hazards model agree that the sex of the household head, sex of the child, number of births in the past 1 year are strongly associated to under-five child mortality in Uganda given all the three covariates satisfy the PH assumption. Random survival forests further demonstrated that covariates that were originally excluded from the earlier analysis due to violation of the PH assumption were important in explaining under-five child mortality rates. These covariates include the number of children under the age of five in a household, number of births in the past 5 years, wealth index, total number of children ever born and the child’s birth order. The results further indicated that the predictive performance for random survival forests built using covariates including those that violate the PH assumption was higher than that for random survival forests built using only covariates that satisfy the PH assumption.ConclusionsRandom survival forests are appealing methods in analysing public health data to understand factors strongly associated with under-five child mortality rates especially in the presence of covariates that violate the proportional hazards assumption.Electronic supplementary materialThe online version of this article (doi:10.1186/s13104-017-2775-6) contains supplementary material, which is available to authorized users.
ObjectivesWe used machine learning algorithms to track how the ranks of importance and the survival outcome of four socioeconomic determinants (place of residence, mother’s level of education, wealth index and sex of the child) of under-5 mortality rate (U5MR) in sub-Saharan Africa have evolved.SettingsThis work consists of multiple cross-sectional studies. We analysed data from the Demographic Health Surveys (DHS) collected from four countries; Uganda, Zimbabwe, Chad and Ghana, each randomly selected from the four subregions of sub-Saharan Africa.ParticipantsEach country has multiple DHS datasets and a total of 11 datasets were selected for analysis. A total of n=85 688 children were drawn from the eleven datasets.Primary and secondary outcomesThe primary outcome variable is U5MR; the secondary outcomes were to obtain the ranks of importance of the four socioeconomic factors over time and to compare the two machine learning models, the random survival forest (RSF) and the deep survival neural network (DeepSurv) in predicting U5MR.ResultsMother’s education level ranked first in five datasets. Wealth index ranked first in three, place of residence ranked first in two and sex of the child ranked last in most of the datasets. The four factors showed a favourable survival outcome over time, confirming that past interventions targeting these factors are yielding positive results. The DeepSurv model has a higher predictive performance with mean concordance indexes (between 67% and 80%), above 50% compared with the RSF model.ConclusionsThe study reveals that children under the age of 5 in sub-Saharan Africa have favourable survival outcomes associated with the four socioeconomic factors over time. It also shows that deep survival neural network models are efficient in predicting U5MR and should, therefore, be used in the big data era to draft evidence-based policies to achieve the third sustainable development goal.
Research that seeks to compare two predictive models requires a thorough statistical approach to draw valid inferences about comparisons between the performance of the two models. Researchers present estimates of model performance with little evidence on whether they reflect true differences in model performance. In this study, we apply two statistical tests, that is, the 5 × 2-fold cv paired t-test, and the combined 5 × 2-fold cv F-test to provide statistical evidence on differences in predictive performance between the Fine-Gray (FG) and random survival forest (RSF) models for competing risks. These models are trained on different scenarios of low-dimensional simulated survival data to determine whether the differences in their predictive performance that exist are indeed significant. Each simulation was repeated one hundred times on ten different seeds. The results indicate that the RSF model is superior in predictive performance in the presence of complex relationships (quadratic and interactions) between the outcome and its predictors. The two statistical tests show that the differences in performance are significant in quadratic simulation but not significant in interaction simulations. The study has also revealed that the FG model is superior in predictive performance in linear simulations and its differences in predictive performance compared to the RSF model are significant. The combined 5 × 2-fold cv F-test has lower type I error rates compared to the 5 × 2-fold cv paired t-test.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.