Even highly motivated undergraduates drift off their STEM career pathways. In large introductory STEM classes, instructors struggle to identify and support these students. To address these issues, we developed co‐redesign methods in partnership with disciplinary experts to create high‐structure STEM courses that better support students and produce informative digital event data. To those data, we applied theory‐ and context‐relevant labels to reflect active and self‐regulated learning processes involving LMS‐hosted course materials, formative assessments, and help‐seeking tools. We illustrate the predictive benefits of this process across two cycles of model creation and reapplication. In cycle 1, we used theory‐relevant features from 3 weeks of data to inform a prediction model that accurately identified struggling students and sustained its accuracy when reapplied in future semesters. In cycle 2, we refit a model with temporally contextualized features that achieved superior accuracy using data from just two class meetings. This modelling approach can produce durable learning analytics solutions that afford scaled and sustained prediction and intervention opportunities that involve explainable artificial intelligence products. Those same products that inform prediction can also guide intervention approaches and inform future instructional design and delivery.Practitioner notesWhat is already known about this topic
Learning analytics includes an evolving collection of methods for tracing and understanding student learning through their engagements with learning technologies.
Prediction models based on demographic data can perpetuate systemic biases.
Prediction models based on behavioural event data can produce accurate predictions of academic success, and validation efforts can enrich those data to reflect students' self‐regulated learning processes within learning tasks.
What this paper adds
Learning analytics can be successfully applied to predict performance in an authentic postsecondary STEM context, and the use of context and theory as guides for feature engineering can ensure sustained predictive accuracy upon reapplication.
The consistent types of learning resources and cyclical nature of their provisioning from lesson to lesson are hallmarks of high‐structure active learning designs that are known to benefit learners. These designs also provide opportunities for observing and modelling contextually grounded, theory‐aligned and temporally positioned learning events that informed prediction models that accurately classified students upon initial and later reapplications in subsequent semesters.
Co‐design relationships where researchers and instructors work together toward pedagogical implementation and course instrumentation are essential to developing unique insights for feature engineering and producing explainable artificial intelligence approaches to predictive modelling.
Implications for practice and/or policy
High‐structure course designs can scaffold student engagement with course materials to make learning more effective and products of feature engineering more explainable.
Learning analytics initiatives can avoid perpetuation of systemic biases when methods prioritize theory‐informed behavioural data that reflect learning processes, sensitivity to instructional context and development of explainable predictors of success rather than relying on students' demographic characteristics as predictors.
Prioritizing behaviours as predictors improves explainability in ways that can inform the redesign of courses and design of learning supports, which further informs the refinement of learning theories and their applications.