The main objective of this work is to make a systematic review of the literature on the prediction of the academic performance of university students by applying data mining techniques. For this purpose, an exhaustive search was carried out and after the analysis of the documentation collected, aspects such as: methodology, attributes, selection algorithms, techniques, tools, and metrics were considered, which served as the basis for the elaboration of this document. The results of the study showed that the most used methodology is KDD(database knowledge extraction), the most important attribute to achieve prediction is CGPA(academic performance), the most commonly used variable selection algorithm is InfoGain-AttributeEval, among the most efficient techniques are Naïve Bayes, Neural Networks (MLP) and Decision Tree (J48), the most used tools for the development of the models is the Weka software and finally the metrics necessary to determine the effectiveness of the model were Precision and Recall.
The main objective of this work is to make a systematic review of the literature on the prediction of the academic performance of university students by applying data mining techniques. For this purpose, an exhaustive search was carried out and after the analysis of the documentation collected, aspects such as: methodology, attributes, selection algorithms, techniques, tools, and metrics were considered, which served as the basis for the elaboration of this document. The results of the study showed that the most used methodology is KDD(database knowledge extraction), the most important attribute to achieve prediction is CGPA(academic performance), the most commonly used variable selection algorithm is InfoGain-AttributeEval, among the most efficient techniques are Naïve Bayes, Neural Networks (MLP) and Decision Tree (J48), the most used tools for the development of the models is the Weka software and finally the metrics necessary to determine the effectiveness of the model were Precision and Recall.
“…Na literatura existem muitas intercalações das técnicas de Mineração de Dados (MD) para descobrir novos conhecimentos aplicados ao contexto educacional motivados por diferentes frentes: educação offline para análises em dados de desempenho do aluno; aprendizado eletrônico (e-learning) e Sistema de Gestão da Aprendizagem; Sistemas Tutores Inteligentes e Sistemas Hipermídias Adaptativos Educacionais [Romero and Ventura, 2010]. Como Chiheb et al [2017] que analisam alunos de graduação e de pós-graduação, classificando-os de acordo com os resultados, que podem ser utilizados pela gestão para indicar os discentes em risco de evasão. Já Shayan and van Zaanen [2019] verificam o comportamento do aluno no Sistema de Gerenciamento de Aprendizado a fim de identificar diferentes desempenhos durante o curso e, assim, apresentar os resultados aos docentes que poderão usar as informações para aprimorar os conteúdos das disciplinas e distinguir aqueles que necessitem de maior atenção.…”
O acesso a um grande volume de dados abertos ampliou as possibilidades de melhoria da gestão dos sistemas públicos. Uma dessas bases é o Exame Nacional do Ensino Médio (ENEM). Nela há informações relevantes do candidato e de seu desempenho nas provas. Nesse cenário, este trabalho teve como objetivo analisar a relação das notas de matemática com outras notas em diferentes áreas, incluindo a escrita, da base do ENEM 2019. Os resultados obtidos apresentam que a nota de matemática (MT) é influenciada pelas demais notas e dentre os oito modelos aplicados o melhor foi o Gradient Boosting com 7,4% de erro na previsão da MT. Essa análise é relevante porque podemos orientar políticas públicas que podem melhorar o desempenho acadêmico geral.
“…They showed that there is a correlation between the identified variables and that the improvements in the prediction model can be obtained using the generalization of the stack (stacking set) with the resulting effect of the increase in accuracy, the error rate reduced and the predictive value. A very precise model suggested by Chiheb et al [12] propose a system whereby undergraduate students and postgraduate students can be classified according to their decisions and their performance can be predicted for years to come based on current results and their historical data. The system can also be used as an early warning tool for high school students and helps graduates to choose the appropriate master discipline for their studies.…”
PSAP: Improving Accuracy of Students' Final Grade Prediction using ID3 and C4.5
This study was aimed to increase the performance of the Predicting Student Academic Performance (PSAP) system, and the outcome is to develop a web application that can be used to analyze student performance during present semester. Development of the web-based application was based on the evolutionary prototyping model. The study also analyses the accuracy of the classifier that is constructed for the prediction features in the web application. Qualitative approaches by user evaluation questionnaire were used for this study. A number of few personnel expert users which are lecturers from Universiti Pendidikan Sultan Idris were chosen as respondents. Each respondent is instructed to answer a total of 27 questions regarding respondent’s background and web application design. The accuracy of the classifier for the prediction features is tested by using the confusion matrix by using the test set of 24 rows. The findings showed the views of respondents on the aspects of interface design, functionality, navigation, and reliability of the web-based application that is developed. The result also showed that accuracy for the classifier constructed by using ID3 classification model (C4.5) is 79.18% and the highest compared to Naïve Bayes and Generalized Linear classification model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.