Exams are a critical part of our
current teaching paradigm and
are used to assign grades, evaluate teaching strategies, and more.
Unfortunately, in the absence of shared/standardized exams, the exam
creation often rests solely on individual instructors whose decisions
are frequently guided by intuition. Here we show that, even on a single
general chemistry instructor/classroom basis, exam quality can be
improved by using (1) classical testing theory and Rasch modeling
to guide reused question selection and (2) a few general, multiple
choice question design criteria. Even after only two and three iterations,
we observed a dramatic improvement in both question and overall exam
quality in nearly every quantitative metric. We also use real outcomes
to show that (1) there is no evidence that students have access to
previous exams or if they do it does not increase their performance
on repeated questions, and (2) a semester-to-semester change in exam
averages may not reflect student abilities but instead could be due
to changes in question difficulty.