Interrater agreement and predictive validity of faculty ratings of pediatric residents

Davis, J K; Inamdar, Sarla; Stone, R K

doi:10.1097/00001888-198611000-00006

Cited by 11 publications

(25 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The reliability of similar types of assessment processes have been variable, although many have claimed adequate reliability (Keck and Arnold 1979;Kwolek et al 1997;Magzoub et al 1998;Kreiter et al 1998;Nasca et al 2002;Durning et al 2005;Beckman et al 2006;Cohen et al 2009;Kreiter et al 1998), others have been either equivocal (Cowles and Kubany 1959;Hull et al 1995;Schwanz et al 1995;Williams et al 2004), or found the reliability not acceptable (Levine and McGuire 1971;Davis et al 1986;Thompson et al 1990;Metheny 1991;Ryan et al 1996;Pulito et al 2007;Searle 2008). A common problem with many of the studies claiming reliability for this form of competency assessment has been the inappropriate use of the alpha coefficient for nested and/or unbalanced designs which appears to be common for workplace-based assessments (Keck and Arnold 1979;Magzoub et al 1998;Nasca et al 2002;Durning et al 2005;Cohen et al 2009).…”

Section: Comparison To Other Studiesmentioning

confidence: 99%

Supervisor assessment of clinical and professional competence of medical trainees: a reliability study using workplace data and a focused analytical literature review

McGill

Vleuten

Clarke

2011

Adv in Health Sci Educ

View full text Add to dashboard Cite

Even though rater-based judgements of clinical competence are widely used, they are context sensitive and vary between individuals and institutions. To deal adequately with rater-judgement unreliability, evaluating the reliability of workplace rater-based assessments in the local context is essential. Using such an approach, the primary intention of this study was to identify the trainee score variation around supervisor ratings, identify sampling number needs of workplace assessments for certification of competence and position the findings within the known literature. This reliability study of workplace-based supervisors' assessments of trainees has a rater-nested-within-trainee design. Score variation attributable to the trainee for each competency item assessed (variance component) were estimated by the minimum-norm quadratic unbiased estimator. Score variance was used to estimate the number needed for a reliability value of 0.80. The trainee score variance for each of 14 competency items varied between 2.3% for emergency skills to 35.6% for communication skills, with an average for all competency items of 20.3%; the "Overall rating" competency item trainee variance was 28.8%. These variance components translated into 169, 7, 17 and 28 assessments needed for a reliability of 0.80, respectively. Most variation in assessment scores was due to measurement error, ranging from 97.7% for emergency skills to 63.4% for communication skills. Similar results have been demonstrated in previously published studies. In summary, overall supervisors' workplace based assessments have poor reliability and are not suitable for use in certification processes in their current form. The marked variation in the supervisors' reliability in assessing different competencies indicates that supervisors may be able to assess some with acceptable reproducibility; in this case communication and possibly overall competence. However, any continued use of this format for assessment of trainee competencies necessitates the identification of what supervisors in different institutions can reliably assess rather than continuing to impose false expectations from unreliable assessments.

show abstract

Section: Comparison To Other Studiesmentioning

confidence: 99%

Supervisor assessment of clinical and professional competence of medical trainees: a reliability study using workplace data and a focused analytical literature review

McGill

Vleuten

Clarke

2011

Adv in Health Sci Educ

View full text Add to dashboard Cite

show abstract

“…[1][2][3][4][5][6][7][8][9][10] It has been argued that no single tool can assess all of these elements with adequate reliability and validity. [1][2][3][4][5][6][7][8][9][10] It has been argued that no single tool can assess all of these elements with adequate reliability and validity.…”

mentioning

confidence: 99%

“…5,6,10,14,15 There should be a combination of subdomain assessments and an overall mark. 3,5,12,[16][17][18][19] Good GAF design does not guarantee reliable and valid evaluations. 3,5,12,[16][17][18][19] Good GAF design does not guarantee reliable and valid evaluations.…”

mentioning

confidence: 99%

“…3,5,12,[16][17][18][19] Good GAF design does not guarantee reliable and valid evaluations. 3 Reliability depends on rater experience. 2,5,12,18,20 Raters seem unable to evaluate a trainee on more than one or two independent dimensions; one study found that 70% of the observed variance in assessors' scores was explained by the variance in two items on the assessment form.…”

mentioning

confidence: 99%

“…3 Reliability depends on rater experience. 3,5,12,21 It is likely that the GAF assesses more components of competence than knowledge alone, so it is not clear that the correlation of the overall GAF score with a knowledge test should be high. 12,16,18,[22][23][24] In tests of predictive validity, GAFs are most commonly compared with tests of knowledge such as multiplechoice and short-answer examinations, where they have demonstrated low to modest correlations of Ϫ0.18 to 0.40.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Predictive Validity of the Global Assessment Form Used in a Final-year Undergraduate Rotation in Emergency Medicine

2002

View full text Add to dashboard Cite

show abstract

Faculty Assessment of Emergency Medicine Resident Grit: A Multicenter Study

Olson

Olson²,

Williamson

et al. 2018

AEM Education and Training

View full text Add to dashboard Cite

Background: Assessment of trainees' competency is challenging; the predictive power of traditional evaluations is debatable especially in regard to noncognitive traits. New assessments need to be sought to better understand affective areas like personality. Grit, defined as "perseverance and passion for long-term goals," can assess aspects of personality. Grit predicts educational attainment and burnout rates in other populations and is accurate with an informant report version. Self-assessments, while useful, have inherent limitations. Faculty's ability to accurately assess trainees' grit could prove helpful in identifying learner needs and avenues for further development.Objective: This study sought to determine the correlation between EM resident self-assessed and facultyassessed Grit Scale (Grit-S) scores of that same resident.Methods: Subjects were PGY-1 to -4 EM residents and resident-selected faculty as part of a larger multicenter trial involving 10 EM residencies during 2017. The Grit-S Scale was administered to participating EM residents; an informant version was completed by their self-selected faculty. Correlation coefficients were computed to assess the relationship between residents' self-assessed and the residents' faculty-assessed Grit-S score.

show abstract

Interrater agreement and predictive validity of faculty ratings of pediatric residents

Cited by 11 publications

References 0 publications

Supervisor assessment of clinical and professional competence of medical trainees: a reliability study using workplace data and a focused analytical literature review

Supervisor assessment of clinical and professional competence of medical trainees: a reliability study using workplace data and a focused analytical literature review

Predictive Validity of the Global Assessment Form Used in a Final-year Undergraduate Rotation in Emergency Medicine

Faculty Assessment of Emergency Medicine Resident Grit: A Multicenter Study

Contact Info

Product

Resources

About