Background and goalThe Random Forest (RF) algorithm for regression and classification has considerably gained popularity since its introduction in 2001. Meanwhile, it has grown to a standard classification approach competing with logistic regression in many innovation-friendly scientific fields.ResultsIn this context, we present a large scale benchmarking experiment based on 243 real datasets comparing the prediction performance of the original version of RF with default parameters and LR as binary classification tools. Most importantly, the design of our benchmark experiment is inspired from clinical trial methodology, thus avoiding common pitfalls and major sources of biases.ConclusionRF performed better than LR according to the considered accuracy measured in approximately 69% of the datasets. The mean difference between RF and LR was 0.029 (95%-CI =[0.022,0.038]) for the accuracy, 0.041 (95%-CI =[0.031,0.053]) for the Area Under the Curve, and − 0.027 (95%-CI =[−0.034,−0.021]) for the Brier score, all measures thus suggesting a significantly better performance of RF. As a side-result of our benchmarking experiment, we observed that the results were noticeably dependent on the inclusion criteria used to select the example datasets, thus emphasizing the importance of clear statements regarding this dataset selection process. We also stress that neutral studies similar to ours, based on a high number of datasets and carefully designed, will be necessary in the future to evaluate further variants, implementations or parameters of random forests which may yield improved accuracy compared to the original version with default values.Electronic supplementary materialThe online version of this article (10.1186/s12859-018-2264-5) contains supplementary material, which is available to authorized users.
We performed a systematic review of studies focusing on the automatic prediction of the progression of mild cognitive impairment to Alzheimer's disease (AD) dementia, and a quantitative analysis of the methodological choices impacting performance. This review included 172 articles, from which 234 experiments were extracted. For each of them, we reported the used data set, the feature types, the algorithm type, performance and potential methodological issues. The impact of these characteristics on the performance was evaluated using a multivariate mixed effect linear regressions. We found that using cognitive, fluorodeoxyglucose-positron emission tomography or potentially electroencephalography and magnetoencephalography variables significantly improved predictive performance compared to not including them, whereas including other modalities, in particular T1 magnetic resonance imaging, did not show a significant effect. The good performance of cognitive assessments questions the wide use of imaging for predicting the progression to AD and advocates for exploring further fine domain-specific cognitive assessments. We also identified several methodological issues, including the absence of a test set, or its use for feature selection or parameter tuning in nearly a fourth of the papers. Other issues, found in 15% of the studies, cast doubts on the relevance of the method to clinical practice. We also highlight that shortterm predictions are likely not to be better than predicting that subjects stay stable over
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.