Although each of the perspectives discussed in this paper advances our understanding of assessor cognition and its impact on WBA, every perspective has its limitations. Following a discussion of areas of concordance and discordance across the perspectives, we propose a coexistent view in which researchers and practitioners utilise aspects of all three perspectives with the goal of advancing assessment quality and ultimately improving patient care.
Potential implications of raters forming differing categorizations of ratees combined with the use of rating scales to collect categorical judgments on measurement outcomes in rater-based assessments are explored.
We propose that a supervisor's perceived responsibility for the ward underlies adjustments between 'hands-on' (i.e. personal ward responsibility) and 'hands-off' (i.e. shared ward responsibility) styles. Our approaches to clinical supervision model combines this responsibility tension with the tension between patient care and teaching to illustrate four supervisory approaches, each with unique priorities influencing entrustment. Given the fluidity in supervision, documenting changes in oversight strategies, rather than absolute levels of entrustment, may be more informative for assessment purposes. Research is needed to determine if there is sufficient association between the supervision provided, the entrustment decision made and the supervisor's trust in a trainee to use these as proxies in assessing a trainee's competence.
The implementation of Entrustable Professional Activities has led to the simultaneous development of assessment based on a supervisor's entrustment of a learner to perform these activities without supervision. While entrustment may be intuitive when we consider the direct observation of a procedural task, the current implementation of rating scales for internal medicine's non-procedural tasks, based on entrustability, may not translate into meaningful learner assessment. In these Perspectives, we outline a number of potential concerns with ad hoc entrustability assessments in internal medicine postgraduate training: differences in the scope of procedural vs. non-procedural tasks, acknowledgement of the type of clinical oversight common within internal medicine, and the limitations of entrustment language. We point towards potential directions for inquiry that would require us to clarify the purpose of the entrustability assessment, reconsider each of the fundamental concepts of entrustment in internal medicine supervision and explore the use of descriptive rather than numeric assessment approaches.
Whenever multiple observers provide ratings, even of the same performance, inter-rater variation is prevalent. The resulting 'idiosyncratic rater variance' is considered to be unusable error of measurement in psychometric models and is a threat to the defensibility of our assessments. Prior studies of inter-rater variation in clinical assessments have used open response formats to gather raters' comments and justifications. This design choice allows participants to use idiosyncratic response styles that could result in a distorted representation of the underlying rater cognition and skew subsequent analyses. In this study we explored rater variability using the structured response format of Q methodology. Physician raters viewed video-recorded clinical performances and provided Mini Clinical Evaluation Exercise (Mini-CEX) assessment ratings through a web-based system. They then shared their assessment impressions by sorting statements that described the most salient aspects of the clinical performance onto a forced quasi-normal distribution ranging from "most consistent with my impression" to "most contrary to my impression". Analysis of the resulting Q-sorts revealed distinct points of view for each performance shared by multiple physicians. The points of view corresponded with the ratings physicians assigned to the performance. Each point of view emphasized different aspects of the performance with either rapport-building and/or medical expertise skills being most salient. It was rare for the points of view to diverge based on disagreements regarding the interpretation of a specific aspect of the performance. As a result, physicians' divergent points of view on a given clinical performance cannot be easily reconciled into a single coherent assessment judgment that is impacted by measurement error. If inter-rater variability does not wholly reflect error of measurement, it is problematic for our current measurement models and poses challenges for how we are to adequately analyze performance assessment ratings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.