This paper presents the work done to support student dropout risk prevention in a real online e-learning environment: A Spanish distance university with thousands of undergraduate students. The main goal is to prevent students from abandoning the university by means of retention actions focused on the most at-risk students, trying to maximize the effectiveness of institutional efforts in this direction. With this purpose, we generated predictive models based on the C5.0 algorithm using data from more than 11,000 students collected along five years. Then we developed SPA, an early warning system that uses these models to generate static early dropout-risk predictions and dynamic periodically updated ones. It also supports the recording of the resulting retention-oriented interventions for further analysis. SPA is in production since 2017 and is currently in its fourth semester of continuous use. It has calculated more than 117,000 risk scores to predict the dropout risk of more than 5,700 students. About 13,000 retention actions have been recorded. The white-box predictive models used in production provided reasonably good results, very close to those obtained in the laboratory. On the way from research to production, we faced several challenges that needed to be effectively addressed in order to be successful. In this paper, we share the challenges faced and the lessons learnt during this process. We hope this helps those who wish to cross the road from predictive modelling with potential value to the exploitation of complete dropout prevention systems that provide sustained value in real production scenarios.
Educational data mining (EDM) combines the techniques of data mining with educational data in order to provide students, instructors, and researchers with knowledge that can benefit academic processes. Due to globalization, foreign language learning (FLL) has become increasingly important. This work seeks to gain insight as to how data mining (DM) is being used to benefit FLL. For this purpose, an advanced review of pertinent research published from 2012 to 2017 was performed. After applying our screening method, 208 papers were selected for the exhaustive analysis. This analysis was divided into four aspects: context (educational environments, educational level), number of items, DM methods, and DM applications. The results indicated that 54% of studies were conducted in traditional environments, while only 3% of studies were performed in an m‐learning environment. In addition, 25 and 72% of the research was conducted in either a primary or secondary level, or in tertiary or adult level, respectively. Likewise, 76% of studies contained datasets of less than 1,000 items. The most utilized EDM methods were: factor analysis, regression, text mining, correlation mining, and causal DM. In addition, the studies analyzed showed that DM is mainly employed to predict the performance of students, to check learners' motivation, and to provide feedback for instructors. These results seem to indicate that although DM has much to offer the increasing number of language students, it is not being used to its full potential.
This article is categorized under:
Application Areas > Education and Learning
Fundamental Concepts of Data and Knowledge > Data Concepts
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.