2020
DOI: 10.1007/s10459-020-09990-x
|View full text |Cite
|
Sign up to set email alerts
|

Re-conceptualising and accounting for examiner (cut-score) stringency in a ‘high frequency, small cohort’ performance test

Abstract: Variation in examiner stringency is an ongoing problem in many performance settings such as in OSCEs, and usually is conceptualised and measured based on scores/grades examiners award. Under borderline regression, the standard within a station is set using checklist/ domain scores and global grades acting in combination. This complexity requires a more nuanced view of what stringency might mean when considering sources of variation of cut-scores in stations. This study uses data from 349 administrations of an … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(13 citation statements)
references
References 28 publications
1
5
0
Order By: Relevance
“…This study confirms that examiner stringency is a very important influence on stationlevel scoring/grading (Tables 4 and 7), and that adjusting for this does impact on stationlevel scores (Table 6). These findings are consistent with a wide range of literature (Homer, 2020;McManus et al, 2006;Santen et al, 2021;Yeates et al, 2018Yeates et al, , 2021, but our work suggests that acceptable levels of overall assessment reliability can be achieved provided the number of stations is large enough (Table 5)-again consistent with other empirical and/or psychometric work (Bloch & Norman, 2012;Park, 2019). There is a lot of residual variance at the station level, and these results do suggest, however, that a focus on examlevel, rather than station-level, performance of a candidates is likely to be more meaningful in terms of good decision-making.…”
Section: Indicative Differences In Exam-level Decisions (Rq3)supporting
confidence: 91%
See 3 more Smart Citations
“…This study confirms that examiner stringency is a very important influence on stationlevel scoring/grading (Tables 4 and 7), and that adjusting for this does impact on stationlevel scores (Table 6). These findings are consistent with a wide range of literature (Homer, 2020;McManus et al, 2006;Santen et al, 2021;Yeates et al, 2018Yeates et al, , 2021, but our work suggests that acceptable levels of overall assessment reliability can be achieved provided the number of stations is large enough (Table 5)-again consistent with other empirical and/or psychometric work (Bloch & Norman, 2012;Park, 2019). There is a lot of residual variance at the station level, and these results do suggest, however, that a focus on examlevel, rather than station-level, performance of a candidates is likely to be more meaningful in terms of good decision-making.…”
Section: Indicative Differences In Exam-level Decisions (Rq3)supporting
confidence: 91%
“…We argue that the statistical methods used here are valuable in quantifying error and its impact on the exam overall, but can never be truly confident that all sources of error have been captured and accounted for properly. This in turn implies that adjusting candidatelevel scores and using these for actual decision-making is hard to justify, as has been argued elsewhere (Homer, 2020).…”
Section: Study Limitations and Final Conclusionmentioning
confidence: 99%
See 2 more Smart Citations
“…As DRIFT effects might result in additional station fails for some students, this could produce unwarranted failure for some candidates. If determined to be of sufficient importance in some instances, this effect could be mitigated by either adjusting students' station-level scores or the station-level pass mark 43 exams over a programme. Importantly, the small (and inconsistently observed) magnitude of the effect we have found in this study may be considered insufficiently important to warrant alterations of this nature, given that other effects (such as the number of OSCE stations 44 ) are known to have a greater influence on reliability of the test.…”
Section: Practical Implicationsmentioning
confidence: 99%