“…Both Presentation Sensei (Kurihara, Goto, Ogata, Matsusaka, & Igarashi, 2007) and Cicero (Batrinca et al ., 2013; Wörtwein et al ., 2015) were only evaluated in this way. Newer systems, Gan, Wong, Mandal, Chandrasekhar, and Kankanhalli (2015), Autommaner (Tanveer, Zhao, Chen, Tiet, & Hoque, 2016) and RAP (Ochoa et al ., 2018), have their accuracy measured to establish if the automatically estimated values correlated with human annotations of the same features. All these evaluations have been laboratory‐based because the participants were not students or the setting was a non‐authentic learning activity.…”