Tools for automatic grading programming assignments, also known as Online Judges, have been widely used to support computer science (CS) courses. Nevertheless, few studies have used these tools to acquire and analyse interaction data to better understand the students’ performance and behaviours, often due to data availability or inadequate granularity. To address this problem, we propose an Online Judge called CodeBench, which allows for fine‐grained data collection of student interactions, at the level of, eg, keystrokes, number of submissions, and grades. We deployed CodeBench for 3 years (2016–18) and collected data from 2058 students from 16 introductory computer science (CS1) courses, on which we have carried out fine‐grained learning analytics, towards early detection of effective/ineffective behaviours regarding learning CS concepts. Results extract clear behavioural classes of CS1 students, significantly differentiated both semantically and statistically, enabling us to better explain how student behaviours during programming have influenced learning outcomes. Finally, we also identify behaviours that can guide novice students to improve their learning performance, which can be used for interventions. We believe this work is a step forward towards enhancing Online Judges and helping teachers and students improve their CS1 teaching/learning practices.
Introductory programming may be complex for many students. Moreover, there is a high failure and dropout rate in these courses. A potential way to tackle this problem is to predict student performance at an early stage, as it facilitates human-AI collaboration towards prescriptive analytics, where the instructors/monitors will be told how to intervene and support students - where early intervention is crucial. However, the literature states that there is no reliable predictor yet for programming students’ performance, since even large-scale analysis of multiple features have resulted in only limited predictive power. Notice that Deep Learning (DL) can provide high-quality results for huge amount of data and complex problems. In this sense, we employed DL for early prediction of students’ performance using data collected in the very first two weeks from introductory programming courses offered for a total of 2058 students during 6 semesters (longitudinal study). We compared our results with the state-of-the-art, an Evolutionary Algorithm (EA) that automatic creates and optimises machine learning pipelines. Our DL model achieved an average accuracy of 82.5%, which is statistically superior to the model constructed and optimised by the EA (p-value << 0.05 even with Bonferroni correction). In addition, we also adapted the DL model in a stacking ensemble for continuous prediction purposes. As a result, our regression model explained ~62% of the final grade variance. In closing, we also provide results on the interpretation of our regression model to understand the leading factors of success and failure in introductory programming.
As programming skills are increasingly required world-wide and across disciplines, many students use online platforms that provide automatic feedback through a Programming Online Judge (POJ) mechanism. POJs are very popular e-learning tools, boasting large collections of programming problems. Despite their many benefits, students often struggle when solving problems not compatible with their prior knowledge. One important cause of this is that usually statements of problems are not classified according to programming topics (paradigms, data structures, etc.) and, hence, students waste time and effort in trying to solve exercises that are not tailored to their level and needs. Thus, to support students, we propose a new, "front-heavy" pipeline method to predict topics of POJ problems, using Bidirectional Encoder Representations from Transformers (BERT) for contextual text augmentation over the problem statements and further allowing for (lighter-weight) classical machine learning for classification. Our model outperformed all current state-of-the art, with an F1-score of ≈ 86% using stratified 10 fold cross-validation in a classically challenging multi-classification problem with seven categories. As a proof of concept, we conducted an experiment to show how our predictive model can be used as a human-AI hybrid complement for POJ, where learners would use AI-based recommendations to find the most appropriate problems.
Programming online judges (POJs) are autograders that have been increasingly used in introductory programming courses (also known as CS1) since these systems provide instantaneous and accurate feedback for learners' codes solutions and reduce instructors' workload in evaluating the assignments. Nonetheless, learners typically struggle to find problems in POJs that are adequate for their programming skills. A potential reason is that POJs present problems with varied categories and difficulty levels, which may cause a cognitive overload, due to the large amount of information (and choice) presented to the student. Thus, students can often feel less capable, which may result in undesirable affective states, such as frustration and demotivation, decreasing their performance and potentially leading to increasing dropout rates. Recently, new research emerged on systems to recommend problems in POJs; however, the data collection for these approaches was not fine-grained; importantly, they did not take into consideration the students' previous effort and achievement. Thus, this study proposes for the first time a prescriptive analytics solution for students' programming behaviour by constructing and evaluating an automatic recommender module based on students' effort, to personalise the problems presented to the learner in POJs. The aim is to improve the learners achievement, whilst minimising negative affective states in CS1 courses. Results in a within-subject double-blind controlled experiment showed that our method significantly improved positive affective states, whilst minimising the negatives ones. Moreover, our recommender significantly increased students' achievement (correct solutions) and reduced dropout and failure in problem-solving.
This work involved human subjects or animals in its research. The authors confirm that all human/animal subject research procedures and protocols are exempt from review board approval.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.