Abstract-Any lecturer would agree that marking exams is the bane of her existence. A time-consuming and tiring process, it often requires complex, subjective judgments. Higher education exams typically take 3.0 hours. Do they really need to last so long? Can we justifiably reduce the number of questions on them? Shortening an exam by one hour, if justified, should result in a one-third reduction in lecturer time and effort spent marking. Surprisingly little empirical research has addressed these problems. Classical methods may be partly to blame for this dearth of studies. We propose an alternative methodology based on three key components including two recent developments in experimental design and statistics --synthetic experimental designs and equivalence hypothesis testing. The third component consists in comparing, on six psychometric criteria, student performance in a class on the standard 3.0-hr final exam with that on shortened exams with proportionately fewer questions. Two are the frequently misunderstood standard psychometric criteria -reliability and validity. We argue that adding four common-sense criteria -justifiability of test use, number of exam questions, equivalence in mean student performance, and correspondence (between shortened and full-length exam scores) -confer significant additional benefits. Our approach provides a simple methodology that lecturers can, with minimal time and effort, use to examine the effect of shortening exams for their own classes.Index Terms-Exam length, psychometric criteria, synthetic experimental designs, test length. I. INTRODUCTIONMarking exams is the bane of any lecturer's existence. It is characteristically a tedious, time-consuming process. The intricate subjective judgments that are required can exhaust even the most dedicated of lecturers. Yet despite the almost universal loathing of lecturers for this activity, surprisingly little research has been published on ways to reduce the time and effort required to mark conventional written examinations. The purpose of the present study is to redress this long-standing neglect. In this paper, we are concerned with mixed-format exams used most commonly in academe and consisting of a mixture of different types of questions including problem solving (requiring detailed solutions), Manuscript received November 6, 2013; revised January 12, 2014 essay, short answer, and multiple choice questions.Written final examinations three hours or more in length are common in many universities and colleges around the world. Why then, if marking is so universally loathed, are exams so long (both in duration and number of questions posed)? Custom or tradition seems the primary reason [1]. Even more surprisingly, long exams have been retained despite dramatic increases in class sizes over the past couple of decades at most institutions of higher learning. Many lecturers complain of class sizes more than doubling in just the last 15-20 years [2]- [4]. In effect, marking time and effort for many lecturers has, at a minimum, effective...
Instructors in higher education frequently employ examinations composed of problem-solving questions to assess student knowledge and learning. But are student scores on these tests reliable? Surprisingly few have researched this question empirically, arguably because of perceived limitations in traditional research methods. Furthermore, many believe multiple choice exams to be a more objective, reliable form of testing students than any other type. We question this wide-spread belief. In a series of empirical studies in 8 classes (401 students) in a finance course, we used a methodology based on three key elements to examine these questions: A true experimental design, more appropriate estimation of exam score reliability, and reliability confidence intervals. Internal consistency reliabilities of problem-solving test scores were consistently high (all > .87, median = .90) across different classes, students, examiners, and exams. In contrast, multiple-choice test scores were less reliable (all < .69). Recommendations are presented for improving the construction of exams in higher education.
In higher education courses, instructors often use mixed-format exams composed of several types of questions such as essays, short-answer, problem-solving, and multiple-choice to evaluate student performance. It is important to discriminate reliably among students according to their performance on final examinations. The lower the reliability of student exam scores, the greater the error associated with making decisions based on them. Why then have we found no previous studies of reliability for this, one of the most common types of exam? We investigated the reliability of student scores on 12 official mixed-format final exams used in 22 classes with 1012 students in six undergraduate courses taught by five professors in three fields of business (finance, accounting, and statistics). We focussed on estimating internal consistency reliability, which is essentially a measure of the reproducibility of test scores. Using coefficient omega, the most appropriate measure for assessing reliability for mixed-format exams, we found that in these 22 classes reliability averaged .85, with over 90% of the classes with reliabilities exceeding .80. These reliabilities are very high, comparable with those reported for professionally developed standardized tests and better than those reported recently for single-format, multiple-choice exams in higher education.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with đź’™ for researchers
Part of the Research Solutions Family.