Instructors in higher education frequently employ examinations composed of problem-solving questions to assess student knowledge and learning. But are student scores on these tests reliable? Surprisingly few have researched this question empirically, arguably because of perceived limitations in traditional research methods. Furthermore, many believe multiple choice exams to be a more objective, reliable form of testing students than any other type. We question this wide-spread belief. In a series of empirical studies in 8 classes (401 students) in a finance course, we used a methodology based on three key elements to examine these questions: A true experimental design, more appropriate estimation of exam score reliability, and reliability confidence intervals. Internal consistency reliabilities of problem-solving test scores were consistently high (all > .87, median = .90) across different classes, students, examiners, and exams. In contrast, multiple-choice test scores were less reliable (all < .69). Recommendations are presented for improving the construction of exams in higher education.