2021
DOI: 10.1097/acm.0000000000004028
|View full text |Cite
|
Sign up to set email alerts
|

Measuring the Effect of Examiner Variability in a Multiple-Circuit Objective Structured Clinical Examination (OSCE)

Abstract: Supplemental Digital Content is available in the text.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
21
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 22 publications
(23 citation statements)
references
References 34 publications
2
21
0
Order By: Relevance
“…This study confirms that examiner stringency is a very important influence on stationlevel scoring/grading (Tables 4 and 7), and that adjusting for this does impact on stationlevel scores (Table 6). These findings are consistent with a wide range of literature (Homer, 2020;McManus et al, 2006;Santen et al, 2021;Yeates et al, 2018Yeates et al, , 2021, but our work suggests that acceptable levels of overall assessment reliability can be achieved provided the number of stations is large enough (Table 5)-again consistent with other empirical and/or psychometric work (Bloch & Norman, 2012;Park, 2019). There is a lot of residual variance at the station level, and these results do suggest, however, that a focus on examlevel, rather than station-level, performance of a candidates is likely to be more meaningful in terms of good decision-making.…”
Section: Indicative Differences In Exam-level Decisions (Rq3)supporting
confidence: 91%
See 2 more Smart Citations
“…This study confirms that examiner stringency is a very important influence on stationlevel scoring/grading (Tables 4 and 7), and that adjusting for this does impact on stationlevel scores (Table 6). These findings are consistent with a wide range of literature (Homer, 2020;McManus et al, 2006;Santen et al, 2021;Yeates et al, 2018Yeates et al, , 2021, but our work suggests that acceptable levels of overall assessment reliability can be achieved provided the number of stations is large enough (Table 5)-again consistent with other empirical and/or psychometric work (Bloch & Norman, 2012;Park, 2019). There is a lot of residual variance at the station level, and these results do suggest, however, that a focus on examlevel, rather than station-level, performance of a candidates is likely to be more meaningful in terms of good decision-making.…”
Section: Indicative Differences In Exam-level Decisions (Rq3)supporting
confidence: 91%
“…It is well known that the impact of variation in examiner stringency is a threat to the validity of OSCE-type assessment outcomes (Bartman et al, 2013;Harasym et al, 2008;McManus et al, 2006;Yeates et al, 2018;Yeates & Sebok-Syer, 2017). In larger OSCEs, the assessment design means that candidates are usually grouped in parallel circuits and 'see' a specific set of examiners (Khan et al, 2013;Pell et al, 2010), which means that it is very difficult to disentangle examiner effects from differences in candidate ability (Yeates et al, 2021;Yeates & Sebok-Syer, 2017). In a single administration of a small OSCE there might be a unique set of examiners for each cohort of candidates, but across different exam administrations the same issues of unwanted variation in scores due to examiner stringency arises.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We used secondary data analysis to address this aim, using data from a recent study by Yeates et al 18 derived from a summative Year 3 undergraduate OSCE exam at Keele University Medical School. Students were studying for the qualification MBChB, which is a 5‐year, predominantly undergraduate, course.…”
Section: Methodsmentioning
confidence: 99%
“…Moreover, although the videos for each station were the same for all groups of examiners, the position of the embedded videos within the OSCE sequence varied for different groups of examiners with some viewing a particular video early in the sequence whilst other examiners viewed the same video late in the sequence of performances (i.e., half of participating examiners scored videos A&B early in the sequence and videos C&D late in the sequence whilst the other half scored videos C&D early in the sequence and videos A&B early in the sequence). Consequently, as Yeates et al's 18 comparisons were derived from the combined scores allocated to both early and late videos, the balanced nature of this variation in embedded video sequence would not be expected to have influenced their comparisons. Nonetheless, this variation in embedded video sequence position enables comparison of scores allocated to the same performance when scored either early or late in the assessment sequence.…”
Section: Methodsmentioning
confidence: 99%