The use of autograding to assess programming students may lead to unfairness if an autograder is incorrectly configured. Mutation analysis offers a potential solution to this problem. By simulating student coding mistakes, an automated technique can evaluate the fairness and completeness of an autograding configuration. In this paper, we introduce a set of mutation operators to be used in such a technique, derived from a mistake classification of real student solutions for two introductory programming tasks.
Automated grading allows for the scalable assessment of large programming courses, often using test cases to determine the correctness of students' programs. However, test suites can vary in multiple ways, such as quality, size, and coverage. In this paper, we investigate how much test suites with varying properties can impact generated grades, and how these properties cause this impact. We conduct a study on artificial faulty programs that simulate students' programming mistakes and test suites generated from manually written tests. We find that these test suites generate greatly varying grades, with the standard deviation of grades for each fault typically representing ∼84% of the grades not apportioned to the fault. We show that different properties of test suites can influence the grades that they produce, with coverage typically making the greatest effect, and mutation score and the potentially redundant repeated coverage of lines also having a significant impact. We offer suggestions based on our findings to assist tutors with building grading test suites that assess students' code in a fair and consistent manner. These suggestions include ensuring that test suites have 100% coverage, avoiding unnecessarily recovering lines, and checking test suites using real or artificial faults.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.