The prediction of early childhood numeracy skills development is often studied by determining the learner’s performance in a numeracy test. It is an important study area since numeracy impacts on the learner’s mathematical and statistical abilities later in life. Despite having pros and cons over each other, classification algorithms are often applied in the prediction of early childhood numeracy skills development without justifying the choice of a certain algorithm over others. In this paper, the bi-directional stepwise logistic regression model (SLRM), hierarchical logistic regression model (HLRM), classification and regression tree (CART) and Naïve Bayes (NB) algorithm were compared in terms of their ability to accurately classify some learners as having passed or having failed a numeracy test. The intention was to determine the most accurate algorithm from the four relative to predicting the learner’s performance in a numeracy test. The algorithms were compared using the true positive rate, true negative rate, specificity, sensitivity, classification error, classification accuracy and the area under the receiver operating characteristic (AUROC) curve. The results showed that the HLRM which has been applied by several previous studies on the prediction of numeracy test competence is the best classifier followed by SLRM, CART then NB. The study also confirmed some important predictors of the learner’s performance in a numeracy test some of which were also identified by some previous studies on early childhood numeracy development. Some gaps and recommendations for future research pertaining to the classification algorithms as well as implications for practice were also highlighted. We have made the HLRM scoring algorithm generated from SPSS available as a supplementary material and can be used to classify a set of new learners to either the pass or fail group.
Despite the growing criminal activities in South Africa, many victims still do not report the crimes, therefore there was a need to understand the determinants of the likelihood of reporting a crime in the country. Binary logistic regression is a supervised machine learning algorithm that can assist in predicting the likelihood of reporting a crime but the selection of relevant variables to add in the model varies from one author to the other. Selection of theoretically sound and statistically relevant independent variables is key to achieving parsimonious multivariate models. This study sought to test the efficiency of some commonly used variable selection methods for logistic regression models in order to identify the most relevant determinants of the likelihood of reporting a crime of housebreaking. The study used 17 candidate variables such as the victims’ demographic variables and their perceptions on the police. The multivariate model fitted using stepwise selection was found to be a best fit for the data based on the lowest AIC, the highest classification accuracy rate and the highest Area under the Receiver Operating Characteristic curve. The model fitted using the Hosmer-Lemeshow (H-L) algorithm was the worst fit for the data. The study revealed a limitation of the stepwise selection method which is that this method may select different independent variables for each unique set of randomly selected observations of the same dataset. The study established a multivariate logistic regression model to predict the likelihood of a victim reporting a crime of housebreaking and the determinants thereof.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.