Jerome C. Clauser scite author profile

Evidence of stable standard setting results over panels or occasions is an important part of the validity argument for an established cut score. Unfortunately, due to the high cost of convening multiple panels of content experts, standards often are based on the recommendation from a single panel of judges. This approach implicitly assumes that the variability across panels will be modest, but little evidence is available to support this assertion. This article examines the stability of Angoff standard setting results across panels. Data were collected for six independent standard setting exercises, with three panels participating in each exercise. The results show that although in some cases the panel effect is negligible, for four of the six data sets the panel facet represented a large portion of the overall error variance. Ignoring the often hidden panel/occasion facet can result in artificially optimistic estimates of the cut score stability. Results based on a single panel should not be viewed as a reasonable estimate of the results that would be found over multiple panels. Instead, the variability seen in a single panel can best be viewed as a lower bound of the expected variability when the exercise is replicated.

show abstract

Effect of Content Knowledge on Angoff‐Style Standard Setting Judgments

Margolis

Mee

Clauser

et al. 2016

Educational Measurement

View full text Add to dashboard Cite

Evidence to support the credibility of standard setting procedures is a critical part of the validity argument for decisions made based on tests that are used for classification. One area in which there has been limited empirical study is the impact of standard setting judge selection on the resulting cut score. One important issue related to judge selection is whether the extent of judges' content knowledge impacts their perceptions of the probability that a minimally proficient examinee will answer the item correctly. The present article reports on two studies conducted in the context of Angoff-style standard setting for medical licensing examinations. In the first study, content experts answered and subsequently provided Angoff judgments for a set of test items. After accounting for perceived item difficulty and judge stringency, answering the item correctly accounted for a significant (and potentially important) impact on expert judgment. The second study examined whether providing the correct answer to the judges would result in a similar effect to that associated with knowing the correct answer. The results suggested that providing the correct answer did not impact judgments. These results have important implications for the validity of standard setting outcomes in general and on judge recruitment specifically.

show abstract

The Effect of Rating Unfamiliar Items on Angoff Passing Scores

Clauser

Hambleton

Baldwin

2016

Educational and Psychological Measurement

View full text Add to dashboard Cite

The Angoff standard setting method relies on content experts to review exam items and make judgments about the performance of the minimally proficient examinee. Unfortunately, at times content experts may have gaps in their understanding of specific exam content. These gaps are particularly likely to occur when the content domain is broad and/or highly technical, or when non-expert stakeholders are included in a standard setting panel (e.g., parents, administrators, or union representatives). When judges lack expertise regarding specific exam content, the ratings associated with those items may be bias. This study attempts to illustrate the impact of rating unfamiliar items on Angoff passing scores. The study presents a comparison of Angoff ratings for typical items with those identified by judges as containing unfamiliar content. The results indicate that judges tend to perceive unfamiliar items as being artificially difficult resulting in systematically lower Angoff ratings. The results suggest that when judges are forced to rate unfamiliar items, the validity of the resulting classification decision may be jeopardized.

show abstract

Determining Item Screening Criteria Using Cost-Benefit Analysis

Bashkov¹,

Clauser²

2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jerome C. Clauser

The Internal Medicine Reporting Milestones: Cross-sectional Description of Initial Implementation in U.S. Residency Programs

An Examination of the Replicability of Angoff Standard Setting Results Within a Generalizability Theory Framework

Effect of Content Knowledge on Angoff‐Style Standard Setting Judgments

The Effect of Rating Unfamiliar Items on Angoff Passing Scores

Determining Item Screening Criteria Using Cost-Benefit Analysis

Contact Info

Product

Resources

About