ABSTRACT:In this paper, we investigate the correspondence between student affect and behavioural engagement in a web-based tutoring platform throughout the school year and learning outcomes at the end of the year on a high-stakes mathematics exam in a manner that is both longitudinal and fine-grained. Affect and behaviour detectors are used to estimate student affective states and behaviour based on post-hoc analysis of tutor log-data. For every student action in the tutor, the detectors give us an estimated probability that the student is in a state of boredom, engaged concentration, confusion, or frustration, and estimates of the probability that the student is exhibiting off-task or gaming behaviours. We used data from the ASSISTments math tutoring system and found that boredom during problem solving is negatively correlated with performance, as expected; however, boredom is positively correlated with performance when exhibited during scaffolded tutoring. A similar pattern is unexpectedly seen for confusion. Engaged concentration and, surprisingly, frustration are both associated with positive learning outcomes. In a second analysis, we build a unified model that predicts student standardized examination scores from a combination of student affect, disengaged behaviour, and performance within the learning system. This model achieves high overall correlation to standardized exam score, showing that these types of features can effectively infer longer-term learning outcomes.
Information and communication technology (ICT)‐enhanced research methods such as educational data mining (EDM) have allowed researchers to effectively model a broad range of constructs pertaining to the student, moving from traditional assessments of knowledge to assessment of engagement, meta‐cognition, strategy and affect. The automated detection of these constructs allows EDM researchers to develop intervention strategies that can be implemented either by the software or the teacher. It also allows for secondary analyses of the construct, where the detectors are applied to a data set that is much larger than one that could be analyzed by more traditional methods. However, in many cases, the data used to develop EDM models are collected from students who may not be representative of the broader populations who are likely to use ICT. In order to use EDM models (automated detectors) with new populations, their generalizability must be verified. In this study, we examine whether detectors of affect remain valid when applied to new populations. Models of four educationally relevant affective states were constructed based on data from urban, suburban and rural students using ASSISTments software for middle school mathematics in the Northeastern United States. We found that affect detectors trained on a population drawn primarily from one demographic grouping do not generalize to populations drawn primarily from the other demographic groupings, even though those populations might be considered part of the same national or regional culture. Models constructed using data from all three subpopulations are more applicable to students in those populations than those trained on a single group, but still do not achieve ideal population validity—the ability to generalize across all subgroups. In particular, models generalize better across urban and suburban students than rural students. These findings have important implications for data collection efforts, validation techniques, and the design of interventions that are intended to be applied at scale.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.