Automated programming assessment systems are useful tools to track the learning progress of students automatically and thereby reduce the workload of educators. They can also be used to gain insights into how students learn, making it easier to formulate strategies aimed at enhancing learning performance. Rather than functional code which is always inspected, code quality remains an essential aspect to which not many educators consider when designing an automated programming assessment system. In this study, we applied data mining techniques to analyze the results of an automated assessment system to reveal unexpressed patterns in code quality improvement that are predictive of final achievements in the course. Cluster analysis is first utilized to categorize students according to their learning behavior and outcomes. Cluster profile analysis is then leveraged to highlight actionable factors that could affect their final grades. Finally, the same factors are employed to construct a classification model by which to make early predictions of the students' final results. Our empirical results demonstrate the efficacy of the proposed scheme in providing valuable insights into the learning behaviors of students in novice programming courses, especially in code quality assurance, which could be used to enhance programming performance at the university level. INDEX TERMS automated programming assessment system, code quality, educational data mining, early learning achievement detection, programming education
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.