Purpose
The purpose of this paper is to empirically investigate and compare the use of multiple data sources, different classifiers and ensembles of classifiers technique in predicting student academic performance. The study will compare the performance and efficiency of ensemble techniques that make use of different combination of data sources with that of base classifiers with single data source.
Design/methodology/approach
Using a quantitative research methodology, data samples of 141 learners enrolled in the University of the West of Scotland were extracted from the institution’s databases and also collected through survey questionnaire. The research focused on three data sources: student record system, learning management system and survey, and also used three state-of-art data mining classifiers, namely, decision tree, artificial neural network and support vector machine for the modeling. In addition, the ensembles of these base classifiers were used in the student performance prediction and the performances of the seven different models developed were compared using six different evaluation metrics.
Findings
The results show that the approach of using multiple data sources along with heterogeneous ensemble techniques is very efficient and accurate in prediction of student performance as well as help in proper identification of student at risk of attrition.
Practical implications
The approach proposed in this study will help the educational administrators and policy makers working within educational sector in the development of new policies and curriculum on higher education that are relevant to student retention. In addition, the general implications of this research to practice is its ability to accurately help in early identification of students at risk of dropping out of HE from the combination of data sources so that necessary support and intervention can be provided.
Originality/value
The research empirically investigated and compared the performance accuracy and efficiency of single classifiers and ensemble of classifiers that make use of single and multiple data sources. The study has developed a novel hybrid model that can be used for predicting student performance that is high in accuracy and efficient in performance. Generally, this research study advances the understanding of the application of ensemble techniques to predicting student performance using learner data and has successfully addressed these fundamental questions: What combination of variables will accurately predict student academic performance? What is the potential of the use of stacking ensemble techniques in accurately predicting student academic performance?