This paper reports on a study to predict students at risk of failing based on data available prior to commencement of first year. The study was conducted over three years, 2010 to 2012, on a student population from a range of academic disciplines, n=1,207. Data was gathered from both student enrollment data and an online, self-reporting, learner-profiling tool administered during first-year student induction. Factors considered included prior academic performance, personality, motivation, self-regulation, learning approaches, age, and gender. Models were trained on data from the 2010 and 2011 student cohort, and tested on data from the 2012 student cohort. A comparison of eight classification algorithms found k-NN achieved best model accuracy (72%), but results from other models were similar, including ensembles (71%), support vector machine (70%), and a decision tree (70%). However, improvements in model accuracy attributable to non-cognitive factors were not significant. Models of subgroups by age and discipline achieved higher accuracies, but were affected by sample size; n<900 underrepresented patterns in the dataset. Factors most predictive of academic performance in first year of study at tertiary education included age, prior academic performance, and selfefficacy. Early modelling of first-year students yielded informative, generalizable models that identified students at risk of failing.Keywords: Learning analytics, learner profiling, academic performance, non-cognitive factors of learning, tertiary education
INTRODUCTIONEnrollment numbers to tertiary education are increasing, as is diversity in student populations (OECD, 2013;Patterson, Carroll, & Harvey, 2014); however, significant numbers of students do not complete the courses in which they enroll, particularly courses with lower entry requirements (ACT, 2012;Mooney, Patterson, O'Connor, & Chantler, 2010). Factors predictive of academic performance have been the focus of research for many years (Farsides & Woodfield, 2003; Moran & Crowley, 1979), and continue as an active research topic (Jayaprakash, Moody, Lauria, Regan, & Baron 2014;Cassidy, 2011;Wise & Shaffer, 2015), indicating the inherent difficulty in generating accurate learning factor models (Knight, Buckingham Shum, & Littleton, 2013;Tempelaar, Cuypers, van de Vrie, Heck, & van der Kooij, 2013
331Tertiary education providers collect much data on students, including demographic data, academic activity, and log data from online campus activities. As a result, the application of data analytics to educational settings is an emerging and growing research discipline of data analytics (Campbell, deBlois, & Oblinger, 2007;Mirriahi, Gašević, Long, & Dawson, 2014;Sachin & Vijay, 2012;Siemens & Baker, 2012). The primary aim of learning analytics is to provide learning professionals, and students, with actionable information that can be used to enhance the learning process (Siemens, 2012;Chatti, Dyckhoff, Schroeder, & Thüs, 2012). Much of the published work in learning analytics is based on...