Introduction: Ensuring examiner equivalence across assessment locations is a priority within distributed Objective Structured Clinical Exams (OSCEs) but is challenging due to lack of overlap in performances judged by different groups of examiners. Yeates et al have develop a methodology (Video-based Examiner Score Comparison and Adjustment (VESCA)) to compare and (potentially) adjust for the influence of different groups of examiners within OSCEs. Whilst initial research has been promising, the accuracy of the adjusted scores produced by VESCA is unknown. As this is critical to VESCA’s utility, we aimed to investigate the accuracy of adjusted scores produced by VESCA under a range of plausible operational parameters.
Methods: using statistical simulation, we investigated how: 1/proportion of participating examiners, 2/ number of linking videos, 3/baseline differences in examiner stringency between schools, 4/number of OSCE stations and 5/different degrees of random error within examiners’ judgements influenced accuracy of adjusted scores. We generated distributions of students’ “true” performances across several stations, added examiner error, and simulated linking through crossed video-scoring, before using Many Facet Rasch Modelling to produce adjusted scores, replicating 1000 times for each permutation, to determine average error reduction and the proportion of students whose scores became more accurate.
Results: Under all conditions where no baseline difference existed between groups of examiners (i.e. random rather than systematic variance), score adjustment minimally improved or worsened score accuracy. Conversely, as modelled (systematic) baseline differences between schools increased, adjustment accuracy increased, reducing error by up to 71% and making scores more accurate for up to 93% of students in the 20% baseline-difference condition.
Conclusions: score adjustment through VESCA will substantially enhance equivalence for candidates in distributed OSCEs when 10–20% baseline differences exist between examiners in different schools. As such differences are plausible in practice, consideration should be given to use of VESCA in large scale/national exams.