We examine whether the popular 2/3 rule-of-thumb splitting criterion used in out-of-sample evaluation of predictive econometric and machine learning models makes sense. We conduct simulations regarding the predictive performance of the logistic regression and decision tree algorithm when considering varying splitting points as well as sample sizes. Our nonexhaustive repeated random sub-sampling simulation approach known as Monte Carlo crossvalidation indicates that while the 2/3 rule-of-thumb works, there is a spectrum of different splitting proportions that yield equally compelling results. Furthermore, our results indicate that the size of the complete sample has little impact on the applicability of the 2/3 rule-ofthumb. However, our analysis reveals that when considering relatively small and relatively large training samples in relation to the sample size, the variation of the predictive accuracy can lead to misleading results. Our results are especially important for IS researchers considering the usage of out-of-sample methods for evaluating their predictive models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.