The large volume of data and its complexity in educational institutions require the sakes from informative technologies. In order to facilitate this task, many researchers have focused on using machine learning to extract knowledge from the education database to support students and instructors in getting better performance. In prediction models, the challenging task is to choose the effective techniques which could produce satisfying predictive accuracy. Hence, in this work, we introduced a hybrid approach of principal component analysis (PCA) as conjunction with four machines learning (ML) algorithms: random forest (RF), C5.0 of decision tree (DT), and naïve Bayes (NB) of Bayes network and support vector machine (SVM), to improve the performances of classification by solving the misclassification problem. Three datasets were used to confirm the robustness of the proposed models. Through the given datasets, we evaluated the classification accuracy and root mean square error (RSME) as evaluation metrics of the proposed models. In this classification problem, 10-fold cross-validation was proposed to evaluate the predictive performance. The proposed hybrid models produced very prediction results which shown itself as the optimal prediction and classification algorithms.
All educational institutions always try to investigate the learning behaviors of students and give early prediction toward student's outcomes for interventing and improving their learning performance. Educational data mining (EDM) offers various effective prediction models to predict student performance. Simultaneously, feature selection (FS) is a method of EDM that is utilized to determine the dominant factors that are needed and sufficient for the target concept. FS method extracts high-quality data that reduce the complexity of the prediction task that can increase the robustness of decision rule. In this paper, we provide a comparative study of feature selection methods for determining dominant factors that highly affect classification performance and improve the performance of prediction models. A new feature selection CHIMI based on ranked vector score is proposed. Analysis of feature sets of each FS method to get the dominant set is executed. The experimental results show that by using the dominant set of the proposed CHIMI method, the classification performance of the proposed models is significantly improved.
The primary goal of educational systems is to enrich the quality of education by maximizing the best results and minimizing the failure rate of poor-performing students. Early predicting student performance has become a challenging task for the improvement and development of academic performance. Educational data mining is an effective discipline of data mining concerned with information integrated into the education domain. The study is of this work is to propose techniques in educational data mining and integrate it into a web-based system for predicting poor-performing students. A comparative study of prediction models was conducted. Subsequently, high performing models were developed to get higher performance. The hybrid random forest named Hybrid RF produces the most successful classification. For the context of intervention and improving the learning outcomes, a novel feature selection method named MICHI, which is the combination of mutual information and chi-square algorithms based on the ranked feature scores is introduced to select a dominant set and improve performance of prediction models. By using the proposed techniques of educational data mining, and academic performance prediction system is subsequently developed for educational stockholders to get an early prediction of student learning outcomes for timely intervention. Experimental results and evaluation surveys report the effectiveness and usefulness of the developed academic prediction system. The system is used to help educational stakeholders for intervening and improving student performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.