Background and Aims
Random Forest (RF) is an analytical technique of Artificial Intelligence (AI) that consists of an assembly of trees built by bootstrapping (resampling with replacement). In each node a subset of predictor variables is selected and for them the best cut point is determined. Each division of the tree is based on a random sample of the predictors. The trees are as long as possible. In the construction of each RF tree a part of the observations is not used (37% approx.). It is called an out-of-bag (OOB) sample and is used to obtain an honest estimate of the predictive capacity of the model. So it does not require validation. In each analysis, a few hundred Regression or classification trees are carried out, depending on whether the response variable is numerical or qualitative respectively. The result is an average of the repeated predictions of the model (Bagging). RF allows to calculate the importance of the predictor variables, which can be used later to be included in a multivariate regression model.
Method
We analyzed 14750 records between 2011 and 2014 contained in Information System of the Autonomous Transplant Coordination of Andalusia (SICATA) a system that includes clinical-epidemiological variables, about anemia, bone bone metabolism, adequacy of dialysis and vascular access. 1911 patients presented the event of interest (exitus). Three predictive and explanatory models of survival are developed: 1-RF. 2-.Multivariate Logistic Regression. 3- Multivariate Logistic Regression that includes the important variables of the previous RF model. We compare them in terms of accuracy (AUC of the ROC curve).
Results
AUC of the ROC curve of the multivariate model without prior RF was: 0.75 AUC of the ROC curve of the multivariate model with previous RF was: 0.81. AUC of the ROC curve of the Random Forest model: 0.98
Conclusion
The Random Forest model has a 98% discrimination in the mortality of patients on Hemodialysis, far superior to the classic multivariate analyzes. The Multivariate Logistic Regression performed with the important RF variables improves the AUC of the previous model 0.81 vs. 0.75.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.