BackgroundConcern exists regarding differential performance of candidates in postgraduate clinical assessments by ethnicity, sex, and country of primary qualification. Could examiner bias be responsible?
AimTo explore whether candidate demographics affect examiners' judgements, by investigating candidates' case performances by candidates' and examiners' demographics.
Design and settingData on 4000 candidates (52 000 cases) sitting the MRCGP clinical skills assessment in 2011-2012.
MethodUnivariate analyses were undertaken of subgroup performance (male/female, white/ black and minority ethnic (BME), UK/non-UK graduates) by parallel examiner demographics. Due to confounding of variables, these were complemented by multivariate ANOVA and multiple regression analyses.
ResultsUnivariate analysis showed some differences between outcomes between the same-group and other-group examiners: these were contradictory regarding examiners 'favouring their own', for example, males received higher marks from female examiners than from males: maximum effect size was 3.6%. A six-way ANOVA confirmed all three candidate and examiner variables as having significant effects individually, identifying one significant interaction (examiner sex by examiner ethnicity). Stepwise regression showed candidate variables predicting 12% of score variance, parallel examiner demographics adding little (approximately 0.2% of variance). One 'transactional' variable proved significant, explaining 0.06% of score variance.
ConclusionExaminers show no general tendency to 'favour their own kind'. With confounding between variables, as far as the impact on candidates' case scores, substantial effects relate to candidate and not examiner characteristics. Candidate-examiner interaction effects were inconsistent in their direction and slight in their calculated impact.
Reliability in written examinations is taken very seriously by examination boards and candidates alike. Within general education many factors influence reliability including variations between markers, within markers, within candidates and within teachers. Mechanisms designed to overcome, or at least minimize, the impact of such variables are detailed. Methods of establishing reliability are also explored in the context of a range of assessment situations. In written tests of general practice within the Membership of the Royal College of General Practitioner (MRCGP) examination considerable effort has put been put into achieving acceptable levels of reliability. Current mechanisms designed to ensure high reliability are described and related to the evolution of the written component of the examination. In addition to description of marker selection and training, question development including construct a detailed example of specific and generic marking schedules is provided. Examination results for the Written Paper of the MRCGP from 1998 to 2003 are reported including Cronbach's alpha coefficients and standard error of measurements, mean scores (and SD) and pass rates. In addition individual discrimination scores for each question in the October 2002 paper are shown. Consistent high reliability of the written component of the MRCGP examination provides valuable lessons in terms of selection, training and monitoring of markers as well as practical methods of moderating factors affecting candidate variability. The challenge for examination developers is to carry these important lessons forward into a modernized assessment structure of UK general practice.
WHAT IS ALREADY KNOWN IN THIS AREA • Simulated patients (SPs) experience stress related to their performance as SPs. • SPs' consistency is maximised by regular training for them and the examiners. • The likelihood of stress may be related to role type and SP acting style. WHAT THIS WORK ADDS • Some quantification of the problems surrounding SPs undertaking continuous effective role playing: (1) this is acceptable to them for a three-day period; (2) by the end of this period, a third of the SPs report stressful symptoms. • That almost all role players feel better able to make judgements of clinical competence as a result of the experience; none would change their general practitioner (GP). • That an SP-based test of consultation skills, perceived as challenging by its candidates, can be regarded by them as an appropriate, realistic and acceptable SUGGESTIONS FOR FUTURE RESEARCH • Investigation of translinguistic effects (differential first language) on candidate scores in SP-based examinations.
WHAT IS ALREADY KNOWN IN THIS AREA • The Simulated Surgery module of the MRCGP examination has been shown to be a valid and reliable assessment of clinical consulting skills. WHAT THIS WORK ADDS • This paper describes the further development of the methodology of the Simulated Surgery; showing the type of data analysis currently used to assure its quality and reliability. The measures taken to tighten up case quality are discussed. SUGGESTIONS FOR FUTURE RESEARCH The future development of clinical skills assessments in general practice is discussed. More work is needed on the effectiveness and reliability of lay assessors in complex integrated clinical cases. New methods to test areas that are difficult to reproduces in a simulated environment (such as acute emergencies and cases with the very young or very old) are also needed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.