In the current study interaction data of students in an online learning setting was used to research whether the academic performance of students at the end of term could be predicted in the earlier weeks. The study was carried out with 76 secondyear university students registered in a Computer Hardware course. The study aimed to answer two principle questions: which algorithms and features best predict the end of term academic performance of students by comparing different classification algorithms and pre-processing techniques and whether or not academic performance can be predicted in the earlier weeks using these features and the selected algorithm. The results of the study indicated that the kNN algorithm accurately predicted unsuccessful students at the end of term with a rate of 89%. When findings were examined regarding the analysis of data obtained in weeks 3, 6, 9, 12, and 14 to predict whether the end-of-term academic performance of students could be predicted in the earlier weeks, it was observed that students who were unsuccessful at the end of term could be predicted with a rate of 74% in as short as 3 weeks' time. The findings obtained from this study are important for the determination of features for early warning systems that can be developed for online learning systems and as indicators of student success. At the same time, it will aid researchers in the selection of algorithms and pre-processing techniques in the analysis of educational data.
Early prediction systems have already been applied successfully in various educational contexts. In this study, we investigated developing an early prediction system in the context of eBook-based teaching-learning and used students' eBook reading data to develop an early warning system for students at-risk of academic failure-students whose academic performance is low. To determine the best performing model and optimum time for possible interventions we created prediction models by using 13 prediction algorithms with the data from different weeks of the course. We also tested effects of data transformation on prediction models. 10-fold cross-validation was used for all prediction models. Accuracy and Kappa metrics were used to compare the performance of the models. Our results revealed that in a sixteen-week long course all models reached their highest performance with the data from the 15th week. On the other hand, starting from the 3rd week, the models classified low and high performing students with an accuracy of over 79%. In terms of algorithms, Random Forest (RF) outperformed other algorithms when raw data were used, however, with the transformed data J48 algorithm performed better. When categorical data were used, Naive Bayes (NB) outperformed other algorithms. Results also indicated that models with transformed data performed lower than the models created using categorical data. However, models with categorical data showed similar performance with models with raw data. The implications of the results presented in this research were also discussed with respect to the field of Learning Analytics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.