Összefoglalás.
A hallgatói lemorzsolódás az egyik legégetőbb probléma a felsőoktatásban. Ebben a
munkában a lemorzsolódás előrejelzésén keresztül bemutatjuk, hogyan tudják
segíteni a felsőoktatás résztvevőit a magyarázható mesterséges intelligencia
(XAI) eszközök, mint például a permutációs fontosság, a parciális függőségi ábra
és a SHAP. Végül pedig kitérünk a kutatás gyakorlati hasznosulásának
lehetőségeire, például, hogy az egyéni előrejelzések magyarázata hogyan teszi
lehetővé a személyre szabott beavatkozást. Az elemzések során azt találtuk, hogy
a középiskolai tanulmányi átlag bír a legnagyobb prediktív erővel a végzés
tényére vonatkozóan. Továbbá annak ellenére, hogy egy műszaki egyetem adatait
elemeztük, azt találtuk, hogy a humán tárgyaknak is nagy inkrementális prediktív
erejük van a végzés tényére vonatkozóan a reál tárgyakhoz képest.
Summary.
Delayed completion and student drop-out are some of the most critical problems in
higher education, especially regarding STEM programs. A high drop-out rate
induces both individual and economic loss, hence a detailed investigation of the
main reasons for dropping out is warranted. Recently, there has been a lot of
interest in the use of machine learning methods for the early detection of
students at risk of dropping out. However, there has not been much debate on the
use of interpretable machine learning (IML) and explainable artificial
intelligence (XAI) technologies for dropout prediction. In this paper, we show
how IML and XAI techniques can assist educational stakeholders in dropout
prediction using data from the Budapest University of Technology and Economics.
We demonstrate that complex black-box machine learning algorithms, for example
CatBoost, are able to effectively detect at-risk student using only
pre-enrollment achievement measures, but they lack interpretability. We
demonstrate how the predictions can be explained both globally and locally using
IML methods including permutation importance (PI), partial dependence plot
(PDP), LIME, and SHAP values.
Using global interpretations, we have found that the factor that has the greatest
impact on academic performance is the high school grade point average, which
measures general knowledge by taking into account grades in history,
mathematics, Hungarian language and literature, a foreign language and a science
subject. However, we also found that both mathematics and the subject of choice
are among the most important variables, which suggests that program-specific
knowledge is not negligible and complements general knowledge. We discovered
that students are more likely to drop out if they do not start their university
studies immediately after leaving secondary school. Using a partial dependence
plot, we showed that humanities also have incremental predictive power, despite
the fact that this analysis is based on data from a technical university.
Finally, we also discuss the potential practical applications of our work, such
as how the explanation of individual predictions allows for personalized
interventions, for example by offering appropriate remedial courses and tutoring
sessions. Our approach is unique in that we not only estimate the probability of
dropping out, but also interpret the model and provide explanations for each
prediction. As a result, this framework can be used in several fields. By
predicting which majors they could be most successful in based on high school
performance indicators, it might, for instance, assist high school students in
selecting the appropriate programs at universities and hence this way it could
be used for career assistance. Through the explanations of local predictions,
the framework provided can also assist students in identifying the skills they
need to develop to succeed in their university studies.